Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-11 Thread Nathaniel Smith
On Thu, Oct 11, 2007 at 11:17:17AM +0200, Markus Schiltknecht wrote:
> But I doubt very much that there are any gains. The index on 
> revision_certs(id) should be enough, as we have only few (in most cases 
> four) revision_certs per revision id.

Measurement beats guessing :-).

> I'm not an expert reading these plans, but for sure both variants use an 
> index scan and not a sequential scan. Thus I don't think it's worth 
> changing these revision_certs indices.

My question was which index it was choosing to use; it is trying to
satisfy a constraint like "A=1, B=2, C=3", and it can pick to either
use an index on A then sequential scan for (B, C), or it can use an
index on (B, C) and then sequential scan for A.  (We do have an index
on (name, value).)  A plausible heuristic would be to use the index
that lets you satisfy more parts of the constraint, but that heuristic
would be wrong in this case, since in fact there are many certs with
NAME=branch, VALUE=net.venge.monotone, but few certs with ID=da39...

-- Nathaniel

-- 
The best book on programming is still Strunk and White.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-11 Thread Markus Schiltknecht

Hi,

Ralf S. Engelschall wrote:

Monotone showed just _warning_ messages but operated just fine.


Strange, the following error message should be returned:

mtn: misuse: /home/markus/.monotone.36.ind.db appears to be a monotone 
database, but this version of

mtn: misuse: monotone does not recognize its schema.
mtn: misuse: you probably need a newer version of monotone.


In general, yes. The (id, name, value) index should be consulted
by SQLite also in case only "id" has to be looked up for a query.


As a quick test, I did add the new, combined index and dropped the old 
one (on id only). Doing that, my db has grown by about 2.4 percent:


-rw-r--r-- 1 markus markus 617M 2007-10-11 10:33 .monotone.36.db
-rw-r--r-- 1 markus markus 632M 2007-10-11 10:38 .monotone.36.ind.db


But I doubt very much that there are any gains. The index on 
revision_certs(id) should be enough, as we have only few (in most cases 
four) revision_certs per revision id.


And AFAICT sqlite optimizes properly, at least the explain result looks 
good:


# sqlite3 .monotone.36.db "EXPLAIN SELECT id, name, value FROM 
revision_certs WHERE id = 'abc' AND name = 'def' AND value = 'ghi';" > 
e1.txt
# sqlite3 .monotone.36.ind.db "EXPLAIN SELECT id, name, value FROM 
revision_certs WHERE id = 'abc' AND name = 'def' AND value = 'ghi';" > 
e2.txt


The two execution plans differ only slightly:

*** e1.txt  2007-10-11 10:53:15.492029559 +0200
--- e2.txt  2007-10-11 10:53:23.128009459 +0200
***
*** 1,26 
  0|Goto|0|22|
  1|Integer|0|0|
! 2|OpenRead|1|28|keyinfo(5,BINARY,BINARY)
! 3|SetNumColumns|1|6|
! 4|String8|0|0|def
  5|IsNull|-1|20|
! 6|String8|0|0|abc
  7|IsNull|-2|20|
  8|String8|0|0|ghi
  9|IsNull|-3|20|
! 10|MakeRecord|3|0|b
  11|MemStore|0|0|
  12|MoveGe|1|20|
  13|MemLoad|0|0|
  14|IdxGE|1|20|+
! 15|Column|1|1|
! 16|Column|1|0|
  17|Column|1|2|
  18|Callback|3|0|
  19|Next|1|13|
  20|Close|1|0|
  21|Halt|0|0|
  22|Transaction|0|0|
! 23|VerifyCookie|0|26|
  24|Goto|0|1|
  25|Noop|0|0|
--- 1,26 
  0|Goto|0|22|
  1|Integer|0|0|
! 2|OpenRead|1|39|keyinfo(3,BINARY,BINARY)
! 3|SetNumColumns|1|4|
! 4|String8|0|0|abc
  5|IsNull|-1|20|
! 6|String8|0|0|def
  7|IsNull|-2|20|
  8|String8|0|0|ghi
  9|IsNull|-3|20|
! 10|MakeRecord|3|0|bbb
  11|MemStore|0|0|
  12|MoveGe|1|20|
  13|MemLoad|0|0|
  14|IdxGE|1|20|+
! 15|Column|1|0|
! 16|Column|1|1|
  17|Column|1|2|
  18|Callback|3|0|
  19|Next|1|13|
  20|Close|1|0|
  21|Halt|0|0|
  22|Transaction|0|0|
! 23|VerifyCookie|0|29|
  24|Goto|0|1|
  25|Noop|0|0|


I'm not an expert reading these plans, but for sure both variants use an 
index scan and not a sequential scan. Thus I don't think it's worth 
changing these revision_certs indices.


Regards

Markus



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-11 Thread Ralf S. Engelschall
On Wed, Oct 10, 2007, Nathaniel Smith wrote:

> [...]
> [Err... though... waittasec.  Shouldn't the database upgrade machinery
> be causing mtn to bomb out on your modified database ("unrecognized
> schema version" or the like)?  Did you disable that or something?]

Monotone showed just _warning_ messages but operated just fine.

> I do wonder where the benefit is coming from in this particular case.
> The index on public_keys is almost certainly just irrelevant (though
> it doesn't hurt and adds scalability), since you probably don't have
> more than, say, 100 keys in there, and the whole table is almost
> certainly cached.
>
> And on revision_certs we already have:
>
> CREATE INDEX revision_certs__id ON revision_certs (id);
> CREATE INDEX revision_certs__name_value ON revision_certs (name, value);
>
> So if we add an index on (id, ) we should remove the index
> on simple (id).

In general, yes. The (id, name, value) index should be consulted
by SQLite also in case only "id" has to be looked up for a query.

> But also that index on simple (id) should be making
> that search fast already, because once you've located the certs
> for a particular rev, then you only have to do a sequential scan over
> 4 of them (in most cases) to find any particular one.  Perhaps
> sqlite's optimizer has gotten *too* smart and is picking the wrong
> index, doing a lookup by (name, value) and then sequential scan to
> match the id?
> [...]

We can check this with a manual "EXPLAIN ", I think.

   Ralf S. Engelschall
   [EMAIL PROTECTED]
   www.engelschall.com



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Nathaniel Smith
On Wed, Oct 10, 2007 at 08:07:26PM +0200, Ralf S. Engelschall wrote:
> What do we think? Should we investigate further and especially add
> additional indices like the above to the Monotone database schema? Or is
> there consensus that this type of speed optimization is just the root of
> furthcoming evil and at least at this time should be still ignored at
> all...

No way, 5x speedups for 1 line of code = ++good.  And while once upon
a time the database upgrade machinery was not savvy to indexes, that
got fixed long ago (exactly to add some similar indexes, IIRC).  We
just don't revisit whether we have the right indexes very often :-).
(This isn't to override any more specific problems people might raise
in this thread, though I didn't see any obvious showstoppers so far.)

[Err... though... waittasec.  Shouldn't the database upgrade machinery
be causing mtn to bomb out on your modified database ("unrecognized
schema version" or the like)?  Did you disable that or something?]

I do wonder where the benefit is coming from in this particular case.
The index on public_keys is almost certainly just irrelevant (though
it doesn't hurt and adds scalability), since you probably don't have
more than, say, 100 keys in there, and the whole table is almost
certainly cached.

And on revision_certs we already have:

CREATE INDEX revision_certs__id ON revision_certs (id);
CREATE INDEX revision_certs__name_value ON revision_certs (name, value);

So if we add an index on (id, ) we should remove the index
on simple (id).  But also that index on simple (id) should be making
that search fast already, because once you've located the certs
for a particular rev, then you only have to do a sequential scan over
4 of them (in most cases) to find any particular one.  Perhaps
sqlite's optimizer has gotten *too* smart and is picking the wrong
index, doing a lookup by (name, value) and then sequential scan to
match the id?

...It may not be worth answering these questions as opposed to just
adding the stupid index, though.

-- Nathaniel

-- 
Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Ralf S. Engelschall
On Wed, Oct 10, 2007, Bruce Stephens wrote:

> Chad Walstrom <[EMAIL PROTECTED]> writes:
>
> [...]
> > Would there be a way to tell sqlite to ignore indices for given
> > operations, such as pulls?
>
> That strikes me as a low-level question that should be ignored (at
> least unless it causes some measurable problem).
>
> One would hope that SQLite will only (or mostly, anyway) use indexes
> when they'll be beneficial.  Index updates strike me as more likely to
> be a problem, but I doubt it makes sense to suggest sometimes not
> updating indexes.

SQLite has a sufficient optimizer and uses indices only where it really
makes sense. See http://www.sqlite.org/optoverview.html for some
details.
   Ralf S. Engelschall
   [EMAIL PROTECTED]
   www.engelschall.com



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Ralf S. Engelschall
On Wed, Oct 10, 2007, Justin Patrin wrote:

> On 10/10/07, Ben Walton <[EMAIL PROTECTED]> wrote:
> > Indexes speed up read operations but slow down writes.  Which do we do
> > more of?  I'd optimize for the case that would benefit most.  I
> > suspect we do a fair amount of both.  In this case, maybe adding
> > indexes for mostly-read tables would be the way to go.
> >
>
> The size increase on the DB should also be investigated here. How much
> did your DB increase in size when the indexes were added?

With the database where all n.v.m.* revisions stay:

| $ sqlite mtn.db
| -- Loading resources from /u/rse/.sqliterc
| SQLite version 3.5.1
| Enter ".help" for instructions
| sqlite> VACUUM;
| $ ls -l
| total 141344
| -rw-r--r--  1 rse  rse  144629760 Oct 11 07:52 mtn.db
| $ sqlite mtn.db
| -- Loading resources from /u/rse/.sqliterc
| SQLite version 3.5.1
| Enter ".help" for instructions
| sqlite> CREATE INDEX revision_certs__id_name_value ON revision_certs (id, 
name, value);
| sqlite> CREATE INDEX public_keys__id ON public_keys (id);
| sqlite> VACUUM;
| $ ls -l
| total 147024
| -rw-r--r--  1 rse  rse  150446080 Oct 11 07:53 mtn.db
| $ bc
| bc 1.06
| Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
| This is free software with ABSOLUTELY NO WARRANTY.
| For details type `warranty'.
| scale = 2
| 150446080/144629760
| 1.04

So, the indices increased the database by just 4% and this IMHO is more
or less negligible...

   Ralf S. Engelschall
   [EMAIL PROTECTED]
   www.engelschall.com



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Bruce Stephens
Chad Walstrom <[EMAIL PROTECTED]> writes:

[...]

> Would there be a way to tell sqlite to ignore indices for given
> operations, such as pulls?

That strikes me as a low-level question that should be ignored (at
least unless it causes some measurable problem).

One would hope that SQLite will only (or mostly, anyway) use indexes
when they'll be beneficial.  Index updates strike me as more likely to
be a problem, but I doubt it makes sense to suggest sometimes not
updating indexes.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Chad Walstrom
Ralph wrote:
>   CREATE INDEX revision_certs__id_name_value ON
>revision_certs (id, name, value);
>   CREATE INDEX public_keys__id ON
>public_keys (id);
>
> This dropped down the total execution time of the mentioned "mtn update"
> command by over 80%!

Ben wrote:
> Indexes speed up read operations but slow down writes.

I can't imagine a lot of writes happening to public_keys. ;-)
revision_certs would get four or more inserts per commit, and obviously
sync operations would add a bunch.  Commits have generally been pretty
fast for me.  Would there be a way to tell sqlite to ignore indices for
given operations, such as pulls?

Chad


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Justin Patrin
On 10/10/07, Ben Walton <[EMAIL PROTECTED]> wrote:
> Indexes speed up read operations but slow down writes.  Which do we do
> more of?  I'd optimize for the case that would benefit most.  I
> suspect we do a fair amount of both.  In this case, maybe adding
> indexes for mostly-read tables would be the way to go.
>

The size increase on the DB should also be investigated here. How much
did your DB increase in size when the indexes were added?

> -Ben
>
> On 10/10/07, Ralf S. Engelschall <[EMAIL PROTECTED]> wrote:
> > Some Monotone operations really operate slower than what one would
> > expect in the first spot. Hence, I've today looked at the run-time of
> > a simple "mtn update" in a workspace which *is already* at h:n.v.m.
> > This "no-operation" command internally performs a dozend times the
> > following SQL queries:
> >
> >   SELECT id, name, value, keypair, signature
> >  FROM revision_certs WHERE id = ? AND name = ? AND value = ?
> >   SELECT keydata FROM public_keys WHERE id = ?
> >   SELECT id FROM public_keys WHERE id = ?
> >
> > The problem is that "revision_certs" and "public_keys" have not the
> > proper indices for those queries and hence full-table scans seem to
> > be performed. I did a quick test and added the following to indices
> > manually:
> >
> >   CREATE INDEX revision_certs__id_name_value ON
> >revision_certs (id, name, value);
> >   CREATE INDEX public_keys__id ON
> >public_keys (id);
> >
> > This dropped down the total execution time of the mentioned "mtn update"
> > command by over 80%! A "time mtn update" showed 0.450s on average before
> > and 0.080s on average afterwards. And this was really not any type of
> > in-depth analysis of the situation. I just created two obvious indices
> > for the most prominent queries which "mtn --debug update" showed me.
> >
> > What do we think? Should we investigate further and especially add
> > additional indices like the above to the Monotone database schema? Or is
> > there consensus that this type of speed optimization is just the root of
> > furthcoming evil and at least at this time should be still ignored at
> > all...
> >Ralf S. Engelschall
> >[EMAIL PROTECTED]
> >www.engelschall.com
> >
> >
> >
> > ___
> > Monotone-devel mailing list
> > Monotone-devel@nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/monotone-devel
> >
>
>
> --
> ---
> Ben Walton <[EMAIL PROTECTED]>
>
> When one person suffers from a delusion, it is called insanity. When
> many people suffer from a delusion it is called Religion.
> Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance
>
> ---
>
>
> ___
> Monotone-devel mailing list
> Monotone-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/monotone-devel
>


-- 
Justin Patrin


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Ben Walton
Indexes speed up read operations but slow down writes.  Which do we do
more of?  I'd optimize for the case that would benefit most.  I
suspect we do a fair amount of both.  In this case, maybe adding
indexes for mostly-read tables would be the way to go.

-Ben

On 10/10/07, Ralf S. Engelschall <[EMAIL PROTECTED]> wrote:
> Some Monotone operations really operate slower than what one would
> expect in the first spot. Hence, I've today looked at the run-time of
> a simple "mtn update" in a workspace which *is already* at h:n.v.m.
> This "no-operation" command internally performs a dozend times the
> following SQL queries:
>
>   SELECT id, name, value, keypair, signature
>  FROM revision_certs WHERE id = ? AND name = ? AND value = ?
>   SELECT keydata FROM public_keys WHERE id = ?
>   SELECT id FROM public_keys WHERE id = ?
>
> The problem is that "revision_certs" and "public_keys" have not the
> proper indices for those queries and hence full-table scans seem to
> be performed. I did a quick test and added the following to indices
> manually:
>
>   CREATE INDEX revision_certs__id_name_value ON
>revision_certs (id, name, value);
>   CREATE INDEX public_keys__id ON
>public_keys (id);
>
> This dropped down the total execution time of the mentioned "mtn update"
> command by over 80%! A "time mtn update" showed 0.450s on average before
> and 0.080s on average afterwards. And this was really not any type of
> in-depth analysis of the situation. I just created two obvious indices
> for the most prominent queries which "mtn --debug update" showed me.
>
> What do we think? Should we investigate further and especially add
> additional indices like the above to the Monotone database schema? Or is
> there consensus that this type of speed optimization is just the root of
> furthcoming evil and at least at this time should be still ignored at
> all...
>Ralf S. Engelschall
>[EMAIL PROTECTED]
>www.engelschall.com
>
>
>
> ___
> Monotone-devel mailing list
> Monotone-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/monotone-devel
>


-- 
---
Ben Walton <[EMAIL PROTECTED]>

When one person suffers from a delusion, it is called insanity. When
many people suffer from a delusion it is called Religion.
Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance

---


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Monotone speedup by adding additional database indices?

2007-10-10 Thread Ralf S. Engelschall
Some Monotone operations really operate slower than what one would
expect in the first spot. Hence, I've today looked at the run-time of
a simple "mtn update" in a workspace which *is already* at h:n.v.m.
This "no-operation" command internally performs a dozend times the
following SQL queries:

  SELECT id, name, value, keypair, signature
 FROM revision_certs WHERE id = ? AND name = ? AND value = ?
  SELECT keydata FROM public_keys WHERE id = ?
  SELECT id FROM public_keys WHERE id = ?

The problem is that "revision_certs" and "public_keys" have not the
proper indices for those queries and hence full-table scans seem to
be performed. I did a quick test and added the following to indices
manually:

  CREATE INDEX revision_certs__id_name_value ON
   revision_certs (id, name, value);
  CREATE INDEX public_keys__id ON
   public_keys (id);

This dropped down the total execution time of the mentioned "mtn update"
command by over 80%! A "time mtn update" showed 0.450s on average before
and 0.080s on average afterwards. And this was really not any type of
in-depth analysis of the situation. I just created two obvious indices
for the most prominent queries which "mtn --debug update" showed me.

What do we think? Should we investigate further and especially add
additional indices like the above to the Monotone database schema? Or is
there consensus that this type of speed optimization is just the root of
furthcoming evil and at least at this time should be still ignored at
all...
   Ralf S. Engelschall
   [EMAIL PROTECTED]
   www.engelschall.com



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel