Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-03 Thread Bert Huijben


> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Marc L. Allen
> Sent: dinsdag 3 september 2013 15:55
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> Am I understanding that, in this example, the I_NODES_PARENT is being
> chosen as the search index because... it's smaller and therefore faster to
find
> initial qualifying rows that you can then use in some sort of ordered
lookup in
> another index/table?
> 
> I'm always in awe of some of the plans a good QA comes up with, and it
> blows me away that there are cases when using a less-covering index would
> be better than a more-covering index.

It also happens to be the first index to be encountered that has wc_id as
first component. 

I'm not sure which part (being first vs being smaller) is used to make the
decision, but this is exactly why I expected this to be a simple corner case
bug instead of part of the new query optimizer design.

In Subversion we have queries that only apply on a single directory level
(where the I_NODES_PARENT is used for) vs queries that apply to an entire
tree (in most cases handled via the primary key index). Using the right
index is critical for end-user performance.


The I_NODES_MOVED index is only used when trees are moved, which is an
uncommon operation, but obtaining information about moves is performance
critical in certain scenarios. I hope we will start using the partial index
support for this with the next release. That should also directly invalidate
using this index for these optimizations .

Bert 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-03 Thread Marc L. Allen
Am I understanding that, in this example, the I_NODES_PARENT is being chosen as 
the search index because... it's smaller and therefore faster to find initial 
qualifying rows that you can then use in some sort of ordered lookup in another 
index/table?

I'm always in awe of some of the plans a good QA comes up with, and it blows me 
away that there are cases when using a less-covering index would be better than 
a more-covering index.


This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-03 Thread Bert Huijben


> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: dinsdag 3 september 2013 02:12
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <rhuij...@apache.org>
> wrote:
> 
> > We anticipate that the wc_id column will be used more in future versions
> > and
> > I had hoped that the assumption that a better index match (matching more
> > usable columns) would always be preferable over one that uses less
> columns.
> >
> > All indexes have the wc_id column first, so I don't see a good reason
that
> > Sqlite in this case prefers only this column of the specific (bad) index
> > over any other index that can help with the rest of the query.
> >
> > This optimization added in 3.7.12 made it easier for us to write clearer
> > well performing queries, without any hassle. . . We can't really assume
our
> > users to run a specific version of Sqlite (as most unix distributions
> > compile sqlite themselves for products like ours)...
> >
> 
> As I've alluded to before, the issue here boils down to the "shape" of the
> content of the database file.  By "shape", I mean the distribution of
> values in each table, and in particular how well each index is able to
> narrow down a search.  Knowledge of the shape of the data is critically
> important in choosing the best query plan when there are two or more
> potentially good query plans.  It isn't possible to make good decisions
> about query plans without knowing the shape of the data.

I understand.
A long long time ago I implemented similar indexing myself for a library
system, where I couldn't even use the whole keys as an index caused by the
storage limitations at the time. I'm glad we don't have that 1980's limits
any more.


At Subversion Our databases aren't well formed by the relational standards,
but everything worked reasonably well with Sqlite 3.7 before. 
I then tuned all queries for Sqlite 3.7.12-3.7.18 where I could get all of
our queries use the right indexes without much trouble.

> When you run ANALYZE, SQLite records information in the sqlite_stat1 table
> that gives it a good picture of the shape of the data.  If you have not
run
> ANALYZE or if you have dropped or deleted the sqlite_stat1 table, then
> SQLite has no information about the true shape of the data, so it has to
> guess.  The guess that SQLite has made for the the past decade is that the
> left-most column of each index does a pretty good job of narrowing down
> the
> search and then subsequent columns merely refine that search slightly.
> This is just a wild guess, of course.  But it seems to have worked
> remarkably well in a wide variety of databases in literally millions of
> applications over the past ten years.

I don't like that I have to do this, but this appears to be the best
maintainable option for the long term future

> The issue at hand is that the guess that SQLite is making on the shape of
> unanalyzed databases is not working very well for SVN.  The problem is
that
> the left-most column of many of the indices do a very poor job of
narrowing
> down the search, in violation of the assumption made for the standard
> guess.
> 
> As I've pointed out, one could just run ANALYZE to populate the
> sqlite_stat1 table, then SQLite will know the database shape and will pick
> good plans.  Or one can manually populate the sqlite_stat1 table with
> something close to the anticipated shape of the database.  But Bert says
he
> doesn't want to do any of that.
> 
> So I've come up with a new, experimental compile-time option:
> 
> -DSQLITE_DEFAULT_INDEX_SHAPE=1
> 
> Rebuilding with the compile-time option above, using code from the
> index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
> causes SQLite to make a much more pessimistic guess about the
> effectiveness
> of unanalyzed indices.  In particular, using the new default shape, SQLite
> assumes that you need to use all columns of an index in order to narrow
the
> search significantly.  The number of rows that need to be searched
> increases geometrically based on the number of unused columns on the
> right-hand side of the index.

I can confirm that this patch updates all our queries to work with a query
plan at least as efficient as with Sqlite 3.7.12-3.7.18.

I will spend some time over the next few days to see if I can get more
details on the difference over the different versions by using a standard
install, a _stat1 table and this patch by extending our test infrastructure.

> Leave the SQLITE_DEFAULT_I

Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Hick Gunter
I think this may help me significantly, as none of the application SELECTs use 
native tables (only the schema metadata analyzer/checker uses native tables) 
but quite a few virtual tables (which cannot be ANALYZED).

-Ursprüngliche Nachricht-
Von: Richard Hipp [mailto:d...@sqlite.org]
Gesendet: Dienstag, 03. September 2013 02:12
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Inefficient covering index used for Subversion with 
SQLite 3.8.0

On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben <rhuij...@apache.org> wrote:

> We anticipate that the wc_id column will be used more in future
> versions and I had hoped that the assumption that a better index match
> (matching more usable columns) would always be preferable over one
> that uses less columns.
>
> All indexes have the wc_id column first, so I don't see a good reason
> that Sqlite in this case prefers only this column of the specific
> (bad) index over any other index that can help with the rest of the query.
>
> This optimization added in 3.7.12 made it easier for us to write
> clearer well performing queries, without any hassle. . . We can't
> really assume our users to run a specific version of Sqlite (as most
> unix distributions compile sqlite themselves for products like ours)...
>

As I've alluded to before, the issue here boils down to the "shape" of the 
content of the database file.  By "shape", I mean the distribution of values in 
each table, and in particular how well each index is able to narrow down a 
search.  Knowledge of the shape of the data is critically important in choosing 
the best query plan when there are two or more potentially good query plans.  
It isn't possible to make good decisions about query plans without knowing the 
shape of the data.

When you run ANALYZE, SQLite records information in the sqlite_stat1 table that 
gives it a good picture of the shape of the data.  If you have not run ANALYZE 
or if you have dropped or deleted the sqlite_stat1 table, then SQLite has no 
information about the true shape of the data, so it has to guess.  The guess 
that SQLite has made for the the past decade is that the left-most column of 
each index does a pretty good job of narrowing down the search and then 
subsequent columns merely refine that search slightly.
This is just a wild guess, of course.  But it seems to have worked remarkably 
well in a wide variety of databases in literally millions of applications over 
the past ten years.

The issue at hand is that the guess that SQLite is making on the shape of 
unanalyzed databases is not working very well for SVN.  The problem is that the 
left-most column of many of the indices do a very poor job of narrowing down 
the search, in violation of the assumption made for the standard guess.

As I've pointed out, one could just run ANALYZE to populate the
sqlite_stat1 table, then SQLite will know the database shape and will pick good 
plans.  Or one can manually populate the sqlite_stat1 table with something 
close to the anticipated shape of the database.  But Bert says he doesn't want 
to do any of that.

So I've come up with a new, experimental compile-time option:

-DSQLITE_DEFAULT_INDEX_SHAPE=1

Rebuilding with the compile-time option above, using code from the
index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
causes SQLite to make a much more pessimistic guess about the effectiveness of 
unanalyzed indices.  In particular, using the new default shape, SQLite assumes 
that you need to use all columns of an index in order to narrow the search 
significantly.  The number of rows that need to be searched increases 
geometrically based on the number of unused columns on the right-hand side of 
the index.

Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the legacy 
assumption about unanalyzed indices.  Set it to 1 for the new assumption that 
seems to work better for the test queries I've run on SVN.  If I decide to 
include this compile-time option in a release, future releases might define 
other default index shapes using larger integers.

I am personally skeptical of this idea of having a compile-time option to 
change the assumptions about the shape of unanalyzed content.  After all, the 
legacy shape assumption has worked remarkably well up until now.  And, more 
importantly, you can always run ANALYZE or otherwise populate the
sqlite_stat1 table to define the shape of the data very precisely.   It
seems to me that if you want to tweak the data shape assumptions, you should 
use the mechanism that is already in place to do that (namely, the
sqlite_stat1 table).   But I'm open to including this new compile-time
option if it seems like it might make things easier for developers.

Please try this out (by downloading sources from 
http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch shown 
there to any 3.7.x or 3.8

Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Richard Hipp
On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben  wrote:

> We anticipate that the wc_id column will be used more in future versions
> and
> I had hoped that the assumption that a better index match (matching more
> usable columns) would always be preferable over one that uses less columns.
>
> All indexes have the wc_id column first, so I don't see a good reason that
> Sqlite in this case prefers only this column of the specific (bad) index
> over any other index that can help with the rest of the query.
>
> This optimization added in 3.7.12 made it easier for us to write clearer
> well performing queries, without any hassle. . . We can't really assume our
> users to run a specific version of Sqlite (as most unix distributions
> compile sqlite themselves for products like ours)...
>

As I've alluded to before, the issue here boils down to the "shape" of the
content of the database file.  By "shape", I mean the distribution of
values in each table, and in particular how well each index is able to
narrow down a search.  Knowledge of the shape of the data is critically
important in choosing the best query plan when there are two or more
potentially good query plans.  It isn't possible to make good decisions
about query plans without knowing the shape of the data.

When you run ANALYZE, SQLite records information in the sqlite_stat1 table
that gives it a good picture of the shape of the data.  If you have not run
ANALYZE or if you have dropped or deleted the sqlite_stat1 table, then
SQLite has no information about the true shape of the data, so it has to
guess.  The guess that SQLite has made for the the past decade is that the
left-most column of each index does a pretty good job of narrowing down the
search and then subsequent columns merely refine that search slightly.
This is just a wild guess, of course.  But it seems to have worked
remarkably well in a wide variety of databases in literally millions of
applications over the past ten years.

The issue at hand is that the guess that SQLite is making on the shape of
unanalyzed databases is not working very well for SVN.  The problem is that
the left-most column of many of the indices do a very poor job of narrowing
down the search, in violation of the assumption made for the standard guess.

As I've pointed out, one could just run ANALYZE to populate the
sqlite_stat1 table, then SQLite will know the database shape and will pick
good plans.  Or one can manually populate the sqlite_stat1 table with
something close to the anticipated shape of the database.  But Bert says he
doesn't want to do any of that.

So I've come up with a new, experimental compile-time option:

-DSQLITE_DEFAULT_INDEX_SHAPE=1

Rebuilding with the compile-time option above, using code from the
index-shape-1 branch (http://www.sqlite.org/src/timeline?r=index-shape-1)
causes SQLite to make a much more pessimistic guess about the effectiveness
of unanalyzed indices.  In particular, using the new default shape, SQLite
assumes that you need to use all columns of an index in order to narrow the
search significantly.  The number of rows that need to be searched
increases geometrically based on the number of unused columns on the
right-hand side of the index.

Leave the SQLITE_DEFAULT_INDEX_SHAPE unset or set it to 0 to get the legacy
assumption about unanalyzed indices.  Set it to 1 for the new assumption
that seems to work better for the test queries I've run on SVN.  If I
decide to include this compile-time option in a release, future releases
might define other default index shapes using larger integers.

I am personally skeptical of this idea of having a compile-time option to
change the assumptions about the shape of unanalyzed content.  After all,
the legacy shape assumption has worked remarkably well up until now.  And,
more importantly, you can always run ANALYZE or otherwise populate the
sqlite_stat1 table to define the shape of the data very precisely.   It
seems to me that if you want to tweak the data shape assumptions, you
should use the mechanism that is already in place to do that (namely, the
sqlite_stat1 table).   But I'm open to including this new compile-time
option if it seems like it might make things easier for developers.

Please try this out (by downloading sources from
http://www.sqlite.org/src/info/d8daaba7da or by applying the simple patch
shown there to any 3.7.x or 3.8.x version of SQLite) and let me know what
you think.

Your feedback is appreciated.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Simon Slavin

On 2 Sep 2013, at 9:54pm, Kees Nuyt  wrote:

> I would reverse 2) and 3), otherwise ANALYZE has nothing to analyze.

Of course.  I thought it right and wrote it wrong.  Thanks for the correction.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Kees Nuyt
On Mon, 2 Sep 2013 17:32:31 +0100, Simon Slavin 
wrote:

>
>On 2 Sep 2013, at 3:58pm, Bert Huijben  wrote:
>
>> We anticipate that the wc_id column will be used more in future versions and
>> I had hoped that the assumption that a better index match (matching more
>> usable columns) would always be preferable over one that uses less columns. 
>
>If you want to test for good indexes and don't understand how indexing will 
>work on your data ...
>
>1) Put in a convincing set of data
>2) Run ANALYZE
>3) Make up a ton of indexes, indexing lots of tables in many different orders.
>4) Find the SELECT and UPDATEs you run most often and use EXPLAIN QUERY PLAN 
>on them.
>5) Keep the indexes the query plans mention, and drop the ones they don't 
>mention.

I would reverse 2) and 3), otherwise ANALYZE has nothing to analyze.
The query plans of 4) are only valid with results of ANALYZE.
Without sqlite_stat1 ,  the SQLite optimizer may choose different
indexes at runtime. So, this strategy only works if Bert decides to
include a populated sqlite_stat1 table in the SVN init code.


>This stuff can't be done by theory: there are too many
>possibilities and it requires too good an understanding 
>of how the query planner works for non-experts.  
>There's no substitute for actual testing.
>
>Simon.

-- 
Groet, Cordialement, Pozdrawiam, Regards,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Simon Slavin

On 2 Sep 2013, at 3:58pm, Bert Huijben  wrote:

> We anticipate that the wc_id column will be used more in future versions and
> I had hoped that the assumption that a better index match (matching more
> usable columns) would always be preferable over one that uses less columns. 

If you want to test for good indexes and don't understand how indexing will 
work on your data ...

1) Put in a convincing set of data
2) Run ANALYZE
3) Make up a ton of indexes, indexing lots of tables in many different orders.
4) Find the SELECT and UPDATEs you run most often and use EXPLAIN QUERY PLAN on 
them.
5) Keep the indexes the query plans mention, and drop the ones they don't 
mention.

This stuff can't be done by theory: there are too many possibilities and it 
requires too good an understanding of how the query planner works for 
non-experts.  There's no substitute for actual testing.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Richard Hipp
On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben  wrote:

> We anticipate that the wc_id column will be used more in future versions
> and
> I had hoped that the assumption that a better index match (matching more
> usable columns) would always be preferable over one that uses less columns.
>
> All indexes have the wc_id column first, so I don't see a good reason that
> Sqlite in this case prefers only this column of the specific (bad) index
> over any other index that can help with the rest of the query.
>

Thank you; I have your complaint.  I will try to come up with some new
mechanism to solve your issues without the use of ANALYZE.

Please recognize that there do exists real-world databases where it is more
efficient to use a single lookup on a single column of an index rather than
doing multiple lookups on the same index using multiple columns, which is
the choice that SQLite faces in your scenario.  It is not clear to me how
SQLite is suppose to tell one case from the other without access to
ANALYZE-like meta-information that describes the shape of the database
content.  But I will investigate and see what I can come up with.

Please also recognize that if I change SQLite to work better with your data
shape, that will also likely make it slower on some other application where
the data has a very different shape, for example where the left-most index
column is distinctive but the second column is not.  The question becomes
whether or not this will be a worthwhile tradeoff.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Richard Hipp
On Mon, Sep 2, 2013 at 10:58 AM, Bert Huijben  wrote:


> We can't really assume our
> users to run a specific version of Sqlite (as most unix distributions
> compile sqlite themselves for products like ours)...
>

Yes you can:  statically link.  Add "sqlite3.c" and "sqlite3.h" to your
source tree so that people who build from sources do not have to download
(a particular version of) sqlite.  Mozilla works this way, as does Fossil.
I think Chromium does too, though I'd need to check to be sure.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Bert Huijben


> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: maandag 2 september 2013 16:11
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Mon, Sep 2, 2013 at 9:42 AM, Bert Huijben <rhuij...@apache.org>
> wrote:
> 
> >
> > My question is more like: Should I see this as a light regression that
most
> > likely will be resolved in a future version, or as a buggy assumption in
> > our
> > code that I should work around? It should be possible via explicit
'INDEXED
> > BY' clauses and/or breaking the query in separate parts. (Or for future
> > versions by pre-filling the stat table)
> >
> 
> I think rules #3 and #4 (especially #4) of
> http://www.sqlite.org/queryplanner-ng.html#howtofix are applicable here.
> 
> Without the results of ANALYZE, SQLite assumes that the left-most column
of
> an index will narrow down a search to about 10 rows.  That is clearly not
> the case with your schema (where the left-most column of several indices,
> wc_id, is always the same value).  Hence, SQLite really needs the results
> of ANALYZE to work efficiently.
> 
> But that does not mean you have to run ANALYZE on each installation.
> Instead, you can install a pre-computed sqlite3_stat1 table (the larger of
> the two you emailed above would suffice) whenever you create a new
> database.  To do this, just run:
> 
>  ANALYZE sqlite_master;   -- Create the sqlite_stat1 table
>  INSERT INTO sqlite_stat1 VALUES(...);  -- repeat as necessary to fill
> in the table.
>  ANALYZE sqlite_master;   -- Load the sqlite_stat1 table into the
> optimizer.
> 
> The above steps only need to be done once, when the database is first
> created, and can be part of the same script that creates all the other
> tables, indices, triggers, and views in your database.  You should never
> need to run ANALYZE again (assuming the "shape" of your data is always
> roughly the same).  The sqlite_stat1 table created here gives SQLite all
> the information it needs to be able to figure out the best way to handle
> queries in your peculiar usage pattern.

Thanks.

We anticipate that the wc_id column will be used more in future versions and
I had hoped that the assumption that a better index match (matching more
usable columns) would always be preferable over one that uses less columns. 

All indexes have the wc_id column first, so I don't see a good reason that
Sqlite in this case prefers only this column of the specific (bad) index
over any other index that can help with the rest of the query. 

This optimization added in 3.7.12 made it easier for us to write clearer
well performing queries, without any hassle. . . We can't really assume our
users to run a specific version of Sqlite (as most unix distributions
compile sqlite themselves for products like ours)...



Most Subversion developers are not database experts, so adding this
initialization will add more code that will have to be explained over and
over again to developers that don't understand how a query optimizer (or a
B-Tree) works.

Thanks for the call ANALYZE twice trick. That avoids having to declare the
table ourselves.

Bert

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Richard Hipp
On Mon, Sep 2, 2013 at 9:42 AM, Bert Huijben  wrote:

>
> My question is more like: Should I see this as a light regression that most
> likely will be resolved in a future version, or as a buggy assumption in
> our
> code that I should work around? It should be possible via explicit 'INDEXED
> BY' clauses and/or breaking the query in separate parts. (Or for future
> versions by pre-filling the stat table)
>

I think rules #3 and #4 (especially #4) of
http://www.sqlite.org/queryplanner-ng.html#howtofix are applicable here.

Without the results of ANALYZE, SQLite assumes that the left-most column of
an index will narrow down a search to about 10 rows.  That is clearly not
the case with your schema (where the left-most column of several indices,
wc_id, is always the same value).  Hence, SQLite really needs the results
of ANALYZE to work efficiently.

But that does not mean you have to run ANALYZE on each installation.
Instead, you can install a pre-computed sqlite3_stat1 table (the larger of
the two you emailed above would suffice) whenever you create a new
database.  To do this, just run:

 ANALYZE sqlite_master;   -- Create the sqlite_stat1 table
 INSERT INTO sqlite_stat1 VALUES(...);  -- repeat as necessary to fill
in the table.
 ANALYZE sqlite_master;   -- Load the sqlite_stat1 table into the
optimizer.

The above steps only need to be done once, when the database is first
created, and can be part of the same script that creates all the other
tables, indices, triggers, and views in your database.  You should never
need to run ANALYZE again (assuming the "shape" of your data is always
roughly the same).  The sqlite_stat1 table created here gives SQLite all
the information it needs to be able to figure out the best way to handle
queries in your peculiar usage pattern.

FWIW, please note that the I_NODES_MOVED index in your schema is completely
useless, since all values of all columns are always the same.  That index
is dead weight.  You might want to drop that index, if the shape of your
data really is always similar to the examples you supplied us.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-09-02 Thread Bert Huijben


> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: vrijdag 30 augustus 2013 21:41
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Fri, Aug 30, 2013 at 3:31 PM, Bert Huijben <rhuij...@apache.org> wrote:
> 
> > The analyze on the very small database (which I used for the comparison
> > between 3.7 and 3.8) is:
> >
> 
> Thanks for the data.

I can share the/an actual database if that would help. (The wc.db describes
a checkout of a 100% public subversion repository, so there is nothing
secret in it)

For myself it would be interesting to know how I should look at the OR
optimization which was available for our use cases between Sqlite 3.7.12 and
3.7.18.

My question is more like: Should I see this as a light regression that most
likely will be resolved in a future version, or as a buggy assumption in our
code that I should work around? It should be possible via explicit 'INDEXED
BY' clauses and/or breaking the query in separate parts. (Or for future
versions by pre-filling the stat table)

Bert

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-08-30 Thread Richard Hipp
On Fri, Aug 30, 2013 at 3:31 PM, Bert Huijben  wrote:

> The analyze on the very small database (which I used for the comparison
> between 3.7 and 3.8) is:
>

Thanks for the data.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-08-30 Thread Bert Huijben
> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: vrijdag 30 augustus 2013 21:16
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Fri, Aug 30, 2013 at 2:44 PM, <rhuij...@apache.org> wrote:
> 
> >
> > The query
> > [[
> > DELETE FROM nodes WHERE wc_id = ?1   AND (local_relpath = ?2 OR
> > (((local_relpath) > (CASE (?2) WHEN '' THEN '' ELSE (?2) || '/' END))
AND
> > ((local_relpath) < CASE (?2) WHEN '' THEN X'' ELSE (?2) || '0'
END)))
> > AND op_depth = ?3
> > ]]
> >
> 
> 
> > Is handled by 3.7.17 as:
> > 0|0|0|SEARCH TABLE nodes USING COVERING INDEX
> sqlite_autoindex_NODES_1
> > (wc_id=? AND local_relpath=? AND op_depth=?) (~1 rows)
> > 0|0|0|SEARCH TABLE nodes USING COVERING INDEX
> sqlite_autoindex_NODES_1
> > (wc_id=? AND local_relpath>? AND local_relpath >
> > Which I read as two separate operations, under the 'OR' optimization
> >
> > But 3.8.0.1 does:
> > 0|0|0|SEARCH TABLE nodes USING COVERING INDEX I_NODES_PARENT
> (wc_id=?)
> >
> > Which in our case is far worse than using the primary key on the normal
> > table as wc_id is constant and local_relpath +- our primary key.
> >
> > But the query planner has know way of knowing that wc_id is always the
> same value, unless you run ANALYZE.  Can you do that, please:  Run
> ANALYZE,
> then send in the content of the resulting "sqlite_stat1" table.

The analyze on the very small database (which I used for the comparison
between 3.7 and 3.8) is:
[[
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
ANALYZE sqlite_master;
INSERT INTO "sqlite_stat1" VALUES('REPOSITORY','I_ROOT','1 1');
INSERT INTO "sqlite_stat1" VALUES('REPOSITORY','I_UUID','1 1');
INSERT INTO "sqlite_stat1"
VALUES('REPOSITORY','sqlite_autoindex_REPOSITORY_1','1 1');
INSERT INTO "sqlite_stat1" VALUES('WCROOT','I_LOCAL_ABSPATH','1 1');
INSERT INTO "sqlite_stat1" VALUES('WCROOT','sqlite_autoindex_WCROOT_1','1
1');
INSERT INTO "sqlite_stat1" VALUES('PRISTINE','I_PRISTINE_MD5','45 1');
INSERT INTO "sqlite_stat1"
VALUES('PRISTINE','sqlite_autoindex_PRISTINE_1','45 1');
INSERT INTO "sqlite_stat1" VALUES('NODES','I_NODES_PARENT','7 7 4 1 1');
INSERT INTO "sqlite_stat1" VALUES('NODES','I_NODES_MOVED','7 7 7 7');
INSERT INTO "sqlite_stat1" VALUES('NODES','sqlite_autoindex_NODES_1','7 7 1
1');
COMMIT;
]]

A larger Subversion working copy gets
[[
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
ANALYZE sqlite_master;
INSERT INTO "sqlite_stat1" VALUES('REPOSITORY','I_ROOT','2 1');
INSERT INTO "sqlite_stat1" VALUES('REPOSITORY','I_UUID','2 2');
INSERT INTO "sqlite_stat1"
VALUES('REPOSITORY','sqlite_autoindex_REPOSITORY_1','2 1');
INSERT INTO "sqlite_stat1" VALUES('WCROOT','I_LOCAL_ABSPATH','1 1');
INSERT INTO "sqlite_stat1" VALUES('WCROOT','sqlite_autoindex_WCROOT_1','1
1');
INSERT INTO "sqlite_stat1" VALUES('PRISTINE','I_PRISTINE_MD5','9451 1');
INSERT INTO "sqlite_stat1"
VALUES('PRISTINE','sqlite_autoindex_PRISTINE_1','9451 1');
INSERT INTO "sqlite_stat1" VALUES('NODES','I_NODES_PARENT','18420 18420 9 1
1');
INSERT INTO "sqlite_stat1" VALUES('NODES','I_NODES_MOVED','18420 18420 18420
18420');
INSERT INTO "sqlite_stat1" VALUES('NODES','sqlite_autoindex_NODES_1','18420
18420 1 1');
COMMIT;
]]

But in general we don't run analyze in the working copies. We would most
likely have to insert some initial analyze data for our use cases as our
users always start from an empty database and there is no real time where we
can run analyze

As I guessed after your question 3.8.0 optimizes this correctly after
analyzing; 3.7.17 also without.

Bert

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-08-30 Thread rhuijben


> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Richard Hipp
> Sent: vrijdag 30 augustus 2013 19:37
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Inefficient covering index used for Subversion with
> SQLite 3.8.0
> 
> On Fri, Aug 30, 2013 at 12:39 PM, <rhuij...@apache.org> wrote:
> 
> > Hi,
> >
> > Using the Subversion 1.7 / 1.8 wc.db schema I get quite different
results
> > from the query planner for several of our queries, where the difference
in
> > performance is quite huge.
> >
> 
> Thanks for the test case.  An adjustment to the query planner so that it
> works better for  your test case has been checked in here:
> http://www.sqlite.org/src/info/79e458ef7a

Thanks. This fix appears to positively affect quite a few of our queries.


There is another common pattern that changes between 3.7.17, which I hoped
was caused by the same problem.

The query
[[
DELETE FROM nodes WHERE wc_id = ?1   AND (local_relpath = ?2 OR
(((local_relpath) > (CASE (?2) WHEN '' THEN '' ELSE (?2) || '/' END)) AND
((local_relpath) < CASE (?2) WHEN '' THEN X'' ELSE (?2) || '0' END)))
AND op_depth = ?3
]]

Which in our sourcecode we generate from the more readable form
[[
DELETE FROM nodes
WHERE wc_id = ?1 
  AND (local_relpath = ?2 OR IS_STRICT_DESCENDANT_OF(local_relpath, ?2))
  AND op_depth = ?3
]]
(which is a valid transformation given the path rules in Subversion)

Is handled by 3.7.17 as:
0|0|0|SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_NODES_1
(wc_id=? AND local_relpath=? AND op_depth=?) (~1 rows)
0|0|0|SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_NODES_1
(wc_id=? AND local_relpath>? AND local_relpathhttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-08-30 Thread Richard Hipp
On Fri, Aug 30, 2013 at 2:44 PM,  wrote:

>
> The query
> [[
> DELETE FROM nodes WHERE wc_id = ?1   AND (local_relpath = ?2 OR
> (((local_relpath) > (CASE (?2) WHEN '' THEN '' ELSE (?2) || '/' END)) AND
> ((local_relpath) < CASE (?2) WHEN '' THEN X'' ELSE (?2) || '0' END)))
> AND op_depth = ?3
> ]]
>


> Is handled by 3.7.17 as:
> 0|0|0|SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_NODES_1
> (wc_id=? AND local_relpath=? AND op_depth=?) (~1 rows)
> 0|0|0|SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_NODES_1
> (wc_id=? AND local_relpath>? AND local_relpath
> Which I read as two separate operations, under the 'OR' optimization
>
> But 3.8.0.1 does:
> 0|0|0|SEARCH TABLE nodes USING COVERING INDEX I_NODES_PARENT (wc_id=?)
>
> Which in our case is far worse than using the primary key on the normal
> table as wc_id is constant and local_relpath +- our primary key.
>
> But the query planner has know way of knowing that wc_id is always the
same value, unless you run ANALYZE.  Can you do that, please:  Run ANALYZE,
then send in the content of the resulting "sqlite_stat1" table.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Inefficient covering index used for Subversion with SQLite 3.8.0

2013-08-30 Thread Richard Hipp
On Fri, Aug 30, 2013 at 12:39 PM,  wrote:

> Hi,
>
> Using the Subversion 1.7 / 1.8 wc.db schema I get quite different results
> from the query planner for several of our queries, where the difference in
> performance is quite huge.
>

Thanks for the test case.  An adjustment to the query planner so that it
works better for  your test case has been checked in here:
http://www.sqlite.org/src/info/79e458ef7a

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users