Re: Worried about single-db performance
On 06/09/2010 15:18, Bert Huijben wrote: -Original Message- From: Matthew Bentham [mailto:mj...@artvps.com] Sent: maandag 6 september 2010 15:07 To: Justin Erenkrantz Cc: Bert Huijben; Greg Stein; Johan Corveleyn; Subversion Development Subject: Re: Worried about single-db performance On 04/09/2010 17:33, Justin Erenkrantz wrote: Aha. Adding exclusive locking into our pragma [http://www.sqlite.org/pragma.html] calls in svn_sqlite__open: PRAGMA locking_mode=exclusive; brings the time for svn st down from 0.680 to 0.310 seconds. And, yes, the I/O percentages drop dramatically: ~/Applications/svn-trunk/bin/svn st 0.37s user 0.31s system 99% cpu 0.684 total ~/Applications/svn-trunk/bin/svn st 0.26s user 0.05s system 98% cpu 0.308 total I *think* we'd be okay with having Sqlite holding its read/write locks for the duration of our database connection rather than constantly giving it up and refetching it between every read and write operation. As I read the sqlite docs, we should still be able to have shared readers in this model - but, it'd create the case where there were idle shared readers (due to network I/O?) would block an attempted writer. With a normal locking mode, a writer could intercept a reader if it were idle and not active. However, I'd think our other locks would interfere with this anyway...so I think it'd be safe. Thoughts? -- justin I think it's essential to use exclusive locking for performance reasons, without it we will get just as many individual file ops as in 1.6 (and it's the number of file ops which causes the performance problem on windows). Did you actually try a shared lock before suggesting this? I haven't re-run those tests on single-db. The results I linked to compare locking_mode=NORMAL and locking_mode=EXCLUSIVE on otherwise identical code. Getting a shared lock actually gives me better performance on this read only operation then an exclusive lock and it doesn't block out other clients (which would be a breaking change from 1.6) I understand locking_mode=EXCLUSIVE to allow shared read-only access. Don't we block write access when other clients are reading already? Or are you worried about where we're releasing the database connection? I'm surprised locking_mode=NORMAL could ever have better performance given that the number of lock operations must be strictly greater and everything else is the same. Getting an exclusive lock on every operation would completely disable Subversions most popular client: TortoiseSVN. I didn't realise this, you are of course right that that would make it unacceptable. I don't really understand why it would break TortoiseSVN, does it take write access and then not release it somehow? Matthew
Re: Worried about single-db performance
On 07/09/2010 13:02, Bert Huijben wrote: -Original Message- From: Matthew Bentham [mailto:mj...@artvps.com] Sent: dinsdag 7 september 2010 13:48 To: Bert Huijben Cc: 'Justin Erenkrantz'; 'Greg Stein'; 'Johan Corveleyn'; 'Subversion Development' Subject: Re: Worried about single-db performance I didn't realise this, you are of course right that that would make it unacceptable. I don't really understand why it would break TortoiseSVN, does it take write access and then not release it somehow? SQLite needs a shared *read* lock to *read*. See http://www.sqlite.org/atomiccommit.html. (Invoking 'svn status' never obtains a write lock; see that document) SQLite only upgrades that read lock to a write (or actually reserved) lock when you perform a db operation that has to change the database. Further on (E.g. too many changes, but look at the documentation for more reasons) this is upgraded to an exclusive lock that blocks all readers and writers out of the db, but it tries to keep this time as short as possible. Your original suggestion is just to make any *reader* block any other *reader*. Which breaks the subversion world. (Just running svn update in 1.6 has about 5 simultaneous independent readers in some phases of update. Most GUI subversion clients I know use multiple client instances at the same time, so they would all have to be rewritten if we obtain an exclusive lock for reading). Sorry, I didn't mean we should take exclusive locks for every transaction, just that we should use PRAGMA locking_mode=EXCLUSIVE. According to the documentation (http://www.sqlite.org/pragma.html) that makes transactions obtain a shared lock for reading which is upgraded to an exclusive lock for writing, and not released until the database connection is closed. I've tried it a couple of times in svn.exe and it always improves performance (over locking_mode=NORMAL) and hasn't caused me problems. Admittedly I haven't tried within the last couple of weeks and I'm afraid I don't have time right now. Am I misreading the documentation? It says The first time the database is read in EXCLUSIVE mode, a shared lock is obtained and held. The first time the database is written, an exclusive lock is obtained and held. Matthew
Re: Worried about single-db performance
On 07.09.2010 14:29, Matthew Bentham wrote: On 07/09/2010 13:02, Bert Huijben wrote: -Original Message- From: Matthew Bentham [mailto:mj...@artvps.com] Sent: dinsdag 7 september 2010 13:48 To: Bert Huijben Cc: 'Justin Erenkrantz'; 'Greg Stein'; 'Johan Corveleyn'; 'Subversion Development' Subject: Re: Worried about single-db performance I didn't realise this, you are of course right that that would make it unacceptable. I don't really understand why it would break TortoiseSVN, does it take write access and then not release it somehow? SQLite needs a shared *read* lock to *read*. See http://www.sqlite.org/atomiccommit.html. (Invoking 'svn status' never obtains a write lock; see that document) SQLite only upgrades that read lock to a write (or actually reserved) lock when you perform a db operation that has to change the database. Further on (E.g. too many changes, but look at the documentation for more reasons) this is upgraded to an exclusive lock that blocks all readers and writers out of the db, but it tries to keep this time as short as possible. Your original suggestion is just to make any *reader* block any other *reader*. Which breaks the subversion world. (Just running svn update in 1.6 has about 5 simultaneous independent readers in some phases of update. Most GUI subversion clients I know use multiple client instances at the same time, so they would all have to be rewritten if we obtain an exclusive lock for reading). Sorry, I didn't mean we should take exclusive locks for every transaction, just that we should use PRAGMA locking_mode=EXCLUSIVE. According to the documentation (http://www.sqlite.org/pragma.html) that makes transactions obtain a shared lock for reading which is upgraded to an exclusive lock for writing, and not released until the database connection is closed. I've tried it a couple of times in svn.exe and it always improves performance (over locking_mode=NORMAL) and hasn't caused me problems. Admittedly I haven't tried within the last couple of weeks and I'm afraid I don't have time right now. Am I misreading the documentation? It says The first time the database is read in EXCLUSIVE mode, a shared lock is obtained and held. The first time the database is written, an exclusive lock is obtained and held. That and held is the problem, IMO. A long-term connection that mostly reads but just happens to write something once will not drop the exclusive lock until the database connection is closed. -- Brane
Re: Worried about single-db performance
On 04/09/2010 17:33, Justin Erenkrantz wrote: Aha. Adding exclusive locking into our pragma [http://www.sqlite.org/pragma.html] calls in svn_sqlite__open: PRAGMA locking_mode=exclusive; brings the time for svn st down from 0.680 to 0.310 seconds. And, yes, the I/O percentages drop dramatically: ~/Applications/svn-trunk/bin/svn st 0.37s user 0.31s system 99% cpu 0.684 total ~/Applications/svn-trunk/bin/svn st 0.26s user 0.05s system 98% cpu 0.308 total I *think* we'd be okay with having Sqlite holding its read/write locks for the duration of our database connection rather than constantly giving it up and refetching it between every read and write operation. As I read the sqlite docs, we should still be able to have shared readers in this model - but, it'd create the case where there were idle shared readers (due to network I/O?) would block an attempted writer. With a normal locking mode, a writer could intercept a reader if it were idle and not active. However, I'd think our other locks would interfere with this anyway...so I think it'd be safe. Thoughts? -- justin I think it's essential to use exclusive locking for performance reasons, without it we will get just as many individual file ops as in 1.6 (and it's the number of file ops which causes the performance problem on windows). I got the same results as you pre single-db in this message (near the end of it), and the other message from me in that thread: http://svn.haxx.se/dev/archive-2010-02/0239.shtml For fun you can try 'locking_mode=MEMORY' which makes it go really really fast (but unsafe wrt atomic operations under certain termination conditions). Matthew
RE: Worried about single-db performance
-Original Message- From: Matthew Bentham [mailto:mj...@artvps.com] Sent: maandag 6 september 2010 15:07 To: Justin Erenkrantz Cc: Bert Huijben; Greg Stein; Johan Corveleyn; Subversion Development Subject: Re: Worried about single-db performance On 04/09/2010 17:33, Justin Erenkrantz wrote: Aha. Adding exclusive locking into our pragma [http://www.sqlite.org/pragma.html] calls in svn_sqlite__open: PRAGMA locking_mode=exclusive; brings the time for svn st down from 0.680 to 0.310 seconds. And, yes, the I/O percentages drop dramatically: ~/Applications/svn-trunk/bin/svn st 0.37s user 0.31s system 99% cpu 0.684 total ~/Applications/svn-trunk/bin/svn st 0.26s user 0.05s system 98% cpu 0.308 total I *think* we'd be okay with having Sqlite holding its read/write locks for the duration of our database connection rather than constantly giving it up and refetching it between every read and write operation. As I read the sqlite docs, we should still be able to have shared readers in this model - but, it'd create the case where there were idle shared readers (due to network I/O?) would block an attempted writer. With a normal locking mode, a writer could intercept a reader if it were idle and not active. However, I'd think our other locks would interfere with this anyway...so I think it'd be safe. Thoughts? -- justin I think it's essential to use exclusive locking for performance reasons, without it we will get just as many individual file ops as in 1.6 (and it's the number of file ops which causes the performance problem on windows). Did you actually try a shared lock before suggesting this? Getting a shared lock actually gives me better performance on this read only operation then an exclusive lock and it doesn't block out other clients (which would be a breaking change from 1.6) Getting an exclusive lock on every operation would completely disable Subversions most popular client: TortoiseSVN. That is not something we can decide in just a few mails. Bert
RE: Worried about single-db performance
-Original Message- From: justin.erenkra...@gmail.com [mailto:justin.erenkra...@gmail.com] On Behalf Of Justin Erenkrantz Sent: zaterdag 4 september 2010 8:33 To: Greg Stein Cc: Johan Corveleyn; Subversion Development Subject: Re: Worried about single-db performance On Fri, Sep 3, 2010 at 8:39 AM, Greg Stein gst...@gmail.com wrote: It should already be faster. Obviously, that's not the case. I just spent a little bit time with Shark and gdb. A cold run of 'svn st' against Subversion trunk checkouts for 1.6 yields 0.402 seconds and 1.7 is 0.919 seconds. Hot runs for 1.6 are about 0.055 seconds with 1.7 at 0.750 seconds. One striking difference in the perf profile between 1.6 trunk is that we seem to do a larger amount of stat() calls in 1.7. From looking at the traces and code, I *think* svn_wc__db_pdh_parse_local_abspath's call to svn_io_check_special_path may be in play here: SQLite also does a stat call per statement, unless there is already a shared lock open, just to check if there is no other process that opened a transaction. (On Windows this specific stat to check for other processes operating on the same db is the performance killer for svn status: Just this stat takes more than 50% of the total processing). Bert
Re: Worried about single-db performance
On 04.09.2010 11:23, Bert Huijben wrote: -Original Message- From: justin.erenkra...@gmail.com [mailto:justin.erenkra...@gmail.com] On Behalf Of Justin Erenkrantz Sent: zaterdag 4 september 2010 8:33 To: Greg Stein Cc: Johan Corveleyn; Subversion Development Subject: Re: Worried about single-db performance On Fri, Sep 3, 2010 at 8:39 AM, Greg Stein gst...@gmail.com wrote: It should already be faster. Obviously, that's not the case. I just spent a little bit time with Shark and gdb. A cold run of 'svn st' against Subversion trunk checkouts for 1.6 yields 0.402 seconds and 1.7 is 0.919 seconds. Hot runs for 1.6 are about 0.055 seconds with 1.7 at 0.750 seconds. One striking difference in the perf profile between 1.6 trunk is that we seem to do a larger amount of stat() calls in 1.7. From looking at the traces and code, I *think* svn_wc__db_pdh_parse_local_abspath's call to svn_io_check_special_path may be in play here: SQLite also does a stat call per statement, unless there is already a shared lock open, just to check if there is no other process that opened a transaction. (On Windows this specific stat to check for other processes operating on the same db is the performance killer for svn status: Just this stat takes more than 50% of the total processing). Hmmm ... easy solution then, just fork off a process that opens the database and these stats should magically vanish ... :) -- Brane
Re: Worried about single-db performance
On Sat, Sep 4, 2010 at 2:23 AM, Bert Huijben b...@qqmail.nl wrote: SQLite also does a stat call per statement, unless there is already a shared lock open, just to check if there is no other process that opened a transaction. (On Windows this specific stat to check for other processes operating on the same db is the performance killer for svn status: Just this stat takes more than 50% of the total processing). Aha. Adding exclusive locking into our pragma [http://www.sqlite.org/pragma.html] calls in svn_sqlite__open: PRAGMA locking_mode=exclusive; brings the time for svn st down from 0.680 to 0.310 seconds. And, yes, the I/O percentages drop dramatically: ~/Applications/svn-trunk/bin/svn st 0.37s user 0.31s system 99% cpu 0.684 total ~/Applications/svn-trunk/bin/svn st 0.26s user 0.05s system 98% cpu 0.308 total I *think* we'd be okay with having Sqlite holding its read/write locks for the duration of our database connection rather than constantly giving it up and refetching it between every read and write operation. As I read the sqlite docs, we should still be able to have shared readers in this model - but, it'd create the case where there were idle shared readers (due to network I/O?) would block an attempted writer. With a normal locking mode, a writer could intercept a reader if it were idle and not active. However, I'd think our other locks would interfere with this anyway...so I think it'd be safe. Thoughts? -- justin
Worried about single-db performance
Hi devs, From what I understand about the performance problems of WC-1 vs. WC-NG, and what I'm reading on this list, I expect(ed) a huge performance boost from WC-NG for certain client operations (especially on Windows, where the locking of WC-1 is quite problematic). Also, I knew I had to wait for single-db to see any real performance benifits. So after you guys switched on single-db, I eagerly gave trunk a spin... Now I'm a little worried, because I don't see any great speed increases (quite the contrary). Some details below. Maybe it's just too early to be looking at this (maybe it's a simple matter of optimizing the data model, adding indexes, optimizing some code paths, ...). So it's fine if you guys say: chill, just wait a couple more weeks. I just need to know whether I should be worried or not :-). Some details ... Setup: - Win XP 32-bit client machine, with antivirus switched off. - Single-db client: tr...@992042 (yesterday), release build with VCE2008 - 1.6 client: 1.6.5 binary from tigris.org that I still had lying around. - Medium size working copy (944 dirs, 10286 files), once checked out with the 1.6 client (WC-1), once checked out with the trunk-single-db client. - 1st run means after reboot, 2nd run means immediately after 1st run. Numbers: 1) Status (svn st) 1.6 client 1st run: real0m41.593s user0m0.015s sys 0m0.015s 1.6 client 2nd run: real0m1.203s user0m0.015s sys 0m0.031s Single-db client 1st run: real0m34.984s user0m0.015s sys 0m0.031s Single-db client 2nd run: real0m6.938s user0m0.015s sys 0m0.031s 2) Update (no changes, wc already up to date) (svn up) 1.6 client 1st run: real0m38.484s user0m0.015s sys 0m0.015s 1.6 client 2nd run: real0m1.141s user0m0.015s sys 0m0.015s Single-db client 1st run: real0m31.375s user0m0.015s sys 0m0.031s Single-db client 2nd run: real0m5.468s user0m0.031s sys 0m0.015s Anyone able to take away my worries :-) ? Cheers, -- Johan
Re: Worried about single-db performance
On Fri, Sep 3, 2010 at 06:09, Johan Corveleyn jcor...@gmail.com wrote: Hi devs, From what I understand about the performance problems of WC-1 vs. WC-NG, and what I'm reading on this list, I expect(ed) a huge performance boost from WC-NG for certain client operations (especially on Windows, where the locking of WC-1 is quite problematic). Also, I knew I had to wait for single-db to see any real performance benifits. So after you guys switched on single-db, I eagerly gave trunk a spin... Now I'm a little worried, because I don't see any great speed increases (quite the contrary). Some details below. Maybe it's just too early to be looking at this (maybe it's a simple matter of optimizing the data model, adding indexes, optimizing some code paths, ...). So it's fine if you guys say: chill, just wait a couple more weeks. I just need to know whether I should be worried or not :-). It should already be faster. Obviously, that's not the case. My expectation is that it would be faster, and then we'd do some perf improvements to make it even faster. Sounds like we definitely have to do some of those other improvements. We have a schema change to make, and once that is done, then we can start looking at the performance. There could be lots of SQL queries that need to be optimized. I also *know* that we issue way too many queries. There should be ways to avoid a lot of those queries. I'd like to avoid any caching, and rely on SQLite to maintain in-memory caches. My gut says we just need to reduce the number of queries. Cheers, -g