> -----Original Message-----
> From: Hyrum K Wright [mailto:[email protected]]
> Sent: maandag 16 mei 2011 11:18
> To: Bert Huijben
> Cc: Branko Čibej; [email protected]
> Subject: Re: SQLite and the LIKE operator
>
> On Mon, May 16, 2011 at 8:28 AM, Bert Huijben <[email protected]> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Hyrum K Wright [mailto:[email protected]]
> >> Sent: maandag 16 mei 2011 9:39
> >> To: Branko Čibej
> >> Cc: [email protected]
> >> Subject: Re: SQLite and the LIKE operator
> >>
> >> 2011/5/16 Branko Čibej <[email protected]>:
> >> > On 16.05.2011 03:13, Hyrum K Wright wrote:
> >> >> Several places in wc_db we use the following pattern to select all
> >> >> nodes with a common tree ancestor:
> >> >> WHERE wc_id = ?1 AND (local_relpath = ?2 OR local_relpath LIKE ?3
> >> ESCAPE '#')
> >> >>
> >> >> While this works, there was some concern about whether or not
> SQLite
> >> >> was using the proper indicies when executing this query. By examining
> >> >> the output for 'EXPLAIN QUERY PLAN' on some of the relevant SELECT
> >> >> statements, I believe it does use the indicies as intended.
> >> >>
> >> >> However, I stumbled across an alternate implementation which I
> believe
> >> >> has some merit. Instead of the above clause, we could use:
> >> >> WHERE wc_id = ?1 AND substr(local_relpath, 1, length(?2)) = ?2
> >
> > This also needs a table scan, as SQLite can't look through the substr to
> > find
> that it can use the index for the result.
>
> The SQLite query analyzer states that this executes a SEARCH, not a
> SCAN, which indicates the use of the index.
>
> > My guess is that
> > WHERE wc_id = ?1 AND ((local_relpath = ?2) OR (local_relpath > ?2 || '/'
> AND local_relpath < ?2 || '0'))
> > is most likely the most efficient form we can get in SQLite as the constant
> string directly map to an index, but we should really create a few tests to
> verify these guesses.
>
> I haven't done any tests, either, but I'm interested in an expression
> which doesn't require us to construct an additional parameter to the
> SQL query in C.
>
> > I'm going to the Elego office now. See you there? :)
Okay, the numbers are in... I wrote a test on a DB with more than 1 million
nodes.
1024 directories, with 1024 files. 1024 files added to the root dir and before
I started there was the normal greek tree, so that is in the same db
-- STMT_SELECT_LIKE_ESCAPED
SELECT local_relpath FROM NODES
WHERE wc_id=?1 AND op_depth=0
AND ((local_relpath=?2)
OR (local_relpath LIKE ?3 ESCAPE '#'))
-- STMT_SELECT_LIKE_NO_ESCAPE
SELECT local_relpath FROM NODES
WHERE wc_id=?1 AND op_depth=0
AND ((local_relpath=?2)
OR (local_relpath LIKE ?3))
-- STMT_SELECT_SUBSTR
SELECT local_relpath FROM NODES
WHERE wc_id=?1 AND op_depth=0
AND ((local_relpath=?2)
OR (SUBSTR(local_relpath, 1, LENGTH(?2)+1) = ?2 || '/'))
-- STMT_SELECT_BETWEEN
SELECT local_relpath FROM NODES
WHERE wc_id=?1 AND op_depth=0
AND ((local_relpath=?2)
OR (local_relpath > ?2 || '/' AND local_relpath < ?2 || '0'))
-- STMT_SELECT_BETWEEN2
SELECT local_relpath FROM NODES
WHERE wc_id=?1 AND op_depth=0
AND ((local_relpath=?2)
OR (local_relpath > ?3 AND local_relpath < ?4))
All the queries returned the same 1025 answers when I looked for all nodes at
and below "300".
I ran the tests 100 times to remove some jitter and the results are (in
microseconds, on my Windows 7 notebook, SQLite 3.7.5):
STMT_SELECT_LIKE_ESCAPED: 743252
STMT_SELECT_LIKE_NO_ESCAPE: 683169
STMT_SELECT_SUBSTR: 885900
STMT_SELECT_BETWEEN: 555341
STMT_SELECT_BETWEEN2: 557631
So the select with > and < is the fastest query style. (Not surprising as
SQLite in some cases optimizes like to this).
Generating the compare keys in SQLite is slightly faster than using static c
strings. So creating them in cwould be even slower.
I added the rough patch I used to create these results to this mail. (I don't
intend to commit it in this style)
Bert
>
> Yes. :)
>
> -Hyrum
Index: dev/build.conf
===================================================================
--- dev/build.conf (revision 1103708)
+++ dev/build.conf (working copy)
@@ -396,6 +396,11 @@ type = sql-header
path = subversion/libsvn_subr
sources = internal_statements.sql
+[db_test_statements]
+description = Some test sql for the db tests
+type = sql-header
+path = subversion/tests/libsvn_wc
+sources = db_test_statements.sql
# ----------------------------------------------------------------------------
#
Index: dev/subversion/tests/libsvn_wc/db-test.c
===================================================================
--- dev/subversion/tests/libsvn_wc/db-test.c (revision 1103708)
+++ dev/subversion/tests/libsvn_wc/db-test.c (working copy)
@@ -47,7 +47,6 @@
#include "../svn_test.h"
-
#define ROOT_ONE "http://example.com/one"
#define ROOT_TWO "http://example.com/two"
#define ROOT_THREE "http://example.com/three"
@@ -318,7 +317,13 @@ static const char * const TESTING_DATA = (
WC_QUERIES_SQL_DECLARE_STATEMENTS(statements);
+#define SVN_DB_TESTS__SQLITE_PERFORMANCE
+#ifdef SVN_DB_TESTS__SQLITE_PERFORMANCE
+#include "db_test_statements.h"
+DB_TEST_STATEMENTS_SQL_DECLARE_STATEMENTS(test_statements);
+#endif
+
static svn_error_t *
create_fake_wc(const char *subdir, int format, apr_pool_t *scratch_pool)
{
@@ -1579,6 +1584,214 @@ test_externals_store(apr_pool_t *pool)
return SVN_NO_ERROR;
}
+static svn_error_t *
+insert_1M_paths(void *baton, svn_sqlite__db_t *sdb, apr_pool_t *scratch_pool)
+{
+ svn_sqlite__stmt_t *stmt;
+ apr_pool_t *iterpool = svn_pool_create(scratch_pool);
+ apr_int64_t wc_id = 1;
+ int i;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, STMT_INSERT_STUPID_NODE));
+
+ for (i = 0; i < 1024; i++)
+ {
+ const char *dir_relpath;
+ const char *file_relpath;
+ int j;
+
+ svn_pool_clear(iterpool);
+ dir_relpath = apr_psprintf(iterpool, "%d", i);
+ file_relpath = apr_psprintf(iterpool, "%d_file", i);
+
+ SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+ wc_id,
+ dir_relpath,
+ ""));
+ SVN_ERR(svn_sqlite__insert(NULL, stmt));
+
+ SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+ wc_id,
+ file_relpath,
+ ""));
+ SVN_ERR(svn_sqlite__insert(NULL, stmt));
+
+ for (j = 0; j < 1024; j++)
+ {
+ const char *item_name = apr_psprintf(iterpool, "%d", j);
+ const char *item_relpath = svn_relpath_join(dir_relpath,
+ item_name, iterpool);
+
+ SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+ wc_id,
+ item_relpath,
+ dir_relpath));
+
+ SVN_ERR(svn_sqlite__insert(NULL, stmt));
+ }
+ }
+ return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+test_sqlite_select_performance(apr_pool_t *pool)
+{
+ svn_wc__db_t *db;
+ svn_sqlite__db_t *sdb;
+ svn_sqlite__stmt_t *stmt;
+ const char *local_abspath;
+ apr_int64_t wc_id=1;
+ apr_time_t t_init, t_inserted, t_0, t_1, t_2, t_3, t_4, t_5;
+ apr_int64_t s_1=0, s_2=0, s_3=0, s_4=0, s_5=0;
+ int i;
+
+ SVN_ERR(create_open(&db, &local_abspath,
+ "sqlite_select_performance", SVN_WC__VERSION, pool));
+
+ /*SVN_ERR(svn_wc__db_init(db, local_abspath, "", "htp://q", "uui-d", 0,
+ svn_depth_infinity, pool));*/
+
+ SVN_DBG(("Path = %s\n", svn_dirent_local_style(local_abspath, pool)));
+ SVN_ERR(svn_sqlite__open(&sdb,
+ svn_dirent_join(local_abspath, ".svn/wc.db", pool),
+ svn_sqlite__mode_readwrite, test_statements,
+ 0, NULL, pool, pool));
+
+ apr_time_clock_hires(pool);
+
+ t_init = apr_time_now();
+ SVN_ERR(svn_sqlite__with_lock(sdb, insert_1M_paths, NULL, pool));
+
+ t_inserted = apr_time_now();
+
+ for(i = 0; i < 100; i++)
+ {
+ t_0 = apr_time_now();
+ {
+ int n = -1;
+ svn_boolean_t have_row;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+ STMT_SELECT_LIKE_ESCAPED));
+ SVN_ERR(svn_sqlite__bindf(stmt, "iss", wc_id, "300", "300/%"));
+
+ do
+ {
+ n++;
+ SVN_ERR(svn_sqlite__step(&have_row, stmt));
+ }
+ while (have_row);
+
+ SVN_TEST_ASSERT(n == 1025);
+ SVN_ERR(svn_sqlite__reset(stmt));
+ }
+ t_1 = apr_time_now();
+
+ {
+ int n = -1;
+ svn_boolean_t have_row;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+ STMT_SELECT_LIKE_NO_ESCAPE));
+ SVN_ERR(svn_sqlite__bindf(stmt, "iss", wc_id, "300", "300/%"));
+
+ do
+ {
+ n++;
+ SVN_ERR(svn_sqlite__step(&have_row, stmt));
+ }
+ while (have_row);
+
+ SVN_TEST_ASSERT(n == 1025);
+ SVN_ERR(svn_sqlite__reset(stmt));
+ }
+ t_2 = apr_time_now();
+
+ {
+ int n = -1;
+ svn_boolean_t have_row;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+ STMT_SELECT_SUBSTR));
+ SVN_ERR(svn_sqlite__bindf(stmt, "is", wc_id, "300"));
+
+ do
+ {
+ n++;
+ SVN_ERR(svn_sqlite__step(&have_row, stmt));
+ }
+ while (have_row);
+
+ SVN_TEST_ASSERT(n == 1025);
+ SVN_ERR(svn_sqlite__reset(stmt));
+ }
+ t_3 = apr_time_now();
+
+ {
+ int n = -1;
+ svn_boolean_t have_row;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+ STMT_SELECT_BETWEEN));
+ SVN_ERR(svn_sqlite__bindf(stmt, "is", wc_id, "300"));
+
+ do
+ {
+ n++;
+ SVN_ERR(svn_sqlite__step(&have_row, stmt));
+ }
+ while (have_row);
+
+ SVN_TEST_ASSERT(n == 1025);
+ SVN_ERR(svn_sqlite__reset(stmt));
+ }
+ t_4 = apr_time_now();
+
+ {
+ int n = -1;
+ svn_boolean_t have_row;
+
+ SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+ STMT_SELECT_BETWEEN2));
+ SVN_ERR(svn_sqlite__bindf(stmt, "isss", wc_id, "300", "300/", "3000"));
+
+ do
+ {
+ n++;
+ SVN_ERR(svn_sqlite__step(&have_row, stmt));
+ }
+ while (have_row);
+
+ SVN_TEST_ASSERT(n == 1025);
+ SVN_ERR(svn_sqlite__reset(stmt));
+ }
+ t_5 = apr_time_now();
+
+ s_1 += (t_1 - t_0);
+ s_2 += (t_2 - t_1);
+ s_3 += (t_3 - t_2);
+ s_4 += (t_4 - t_3);
+ s_5 += (t_5 - t_4);
+
+ SVN_DBG(("%d: %d, %d, %d, %d, %d\n", (int)(t_inserted - t_init),
+ (int)(t_1 - t_0),
+ (int)(t_2 - t_1),
+ (int)(t_3 - t_2),
+ (int)(t_4 - t_3),
+ (int)(t_5 - t_4)));
+ }
+
+ SVN_DBG(("AVG: %d, %d, %d, %d, %d\n",
+ (int)(s_1 / 100),
+ (int)(s_2 / 100),
+ (int)(s_3 / 100),
+ (int)(s_4 / 100),
+ (int)(s_5 / 100)));
+
+ return SVN_NO_ERROR;
+}
+
+
struct svn_test_descriptor_t test_funcs[] =
{
SVN_TEST_NULL,
@@ -1604,5 +1817,9 @@ struct svn_test_descriptor_t test_funcs[] =
"work queue processing"),
SVN_TEST_PASS2(test_externals_store,
"externals store"),
+#ifdef SVN_DB_TESTS__SQLITE_PERFORMANCE
+ SVN_TEST_PASS2(test_sqlite_select_performance,
+ "sqlite select performance scenarios"),
+#endif
SVN_TEST_NULL
};
Index: dev/subversion/tests/libsvn_wc/db_test_statements.sql
===================================================================
--- dev/subversion/tests/libsvn_wc/db_test_statements.sql (revision 0)
+++ dev/subversion/tests/libsvn_wc/db_test_statements.sql (revision 0)
@@ -0,0 +1,57 @@
+/* db_test_statements.sql -- queries use by the WC-DB test application
+ * This is intended for use with SQLite 3
+ *
+ * ====================================================================
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ * ====================================================================
+ */
+
+-- STMT_INSERT_STUPID_NODE
+INSERT INTO NODES
+ (wc_id, local_relpath, parent_relpath, presence, op_depth, kind)
+VALUES (?1, ?2, ?3, 'normal', 0, 'dir')
+
+-- STMT_SELECT_LIKE_ESCAPED
+ SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+ AND ((local_relpath=?2)
+ OR (local_relpath LIKE ?3 ESCAPE '#'))
+
+-- STMT_SELECT_LIKE_NO_ESCAPE
+ SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+ AND ((local_relpath=?2)
+ OR (local_relpath LIKE ?3))
+
+-- STMT_SELECT_SUBSTR
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+ AND ((local_relpath=?2)
+ OR (SUBSTR(local_relpath, 1, LENGTH(?2)+1) = ?2 || '/'))
+
+-- STMT_SELECT_BETWEEN
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+ AND ((local_relpath=?2)
+ OR (local_relpath > ?2 || '/' AND local_relpath < ?2 || '0'))
+
+-- STMT_SELECT_BETWEEN2
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+ AND ((local_relpath=?2)
+ OR (local_relpath > ?3 AND local_relpath < ?4))
\ No newline at end of file
Index: dev/subversion/include/private/svn_sqlite.h
===================================================================
--- dev/subversion/include/private/svn_sqlite.h (revision 1103708)
+++ dev/subversion/include/private/svn_sqlite.h (working copy)
@@ -113,7 +113,7 @@ svn_sqlite__set_schema_version(svn_sqlite__db_t *d
The statements will be finalized and the SQLite database will be closed
when RESULT_POOL is cleaned up. */
svn_error_t *
-svn_sqlite__open(svn_sqlite__db_t **db, const char *repos_path,
+svn_sqlite__open(svn_sqlite__db_t **db, const char *path,
svn_sqlite__mode_t mode, const char * const statements[],
int latest_schema, const char * const *upgrade_sql,
apr_pool_t *result_pool, apr_pool_t *scratch_pool);
Index: dev/subversion/libsvn_client/externals.c
Index: dev/build/transform_sql.py
===================================================================
--- dev/build/transform_sql.py (revision 1103708)
+++ dev/build/transform_sql.py (working copy)
@@ -53,31 +53,31 @@ class Processor(object):
re_include = re.compile('-- *include: *([-a-z]+)')
re_define = re.compile('-- *define: *([A-Z_0-9]+)')
- def _sub_format(self, match):
+ def _sub_format(self, match, prefix):
vsn = match.group(1)
self.close_define()
self.output.write('#define %s_%s \\\n' % (self.var_name, match.group(1)))
self.var_printed = True
- def _sub_statement(self, match):
+ def _sub_statement(self, match, prefix):
name = match.group(1)
self.close_define()
self.output.write('#define STMT_%s %d\n' % (match.group(1),
self.stmt_count))
- self.output.write('#define STMT_%d \\\n' % (self.stmt_count,))
+ self.output.write('#define STMT_%s_%d \\\n' % (prefix, self.stmt_count,))
self.var_printed = True
self.stmt_count += 1
- def _sub_include(self, match):
+ def _sub_include(self, match, prefix):
filepath = os.path.join(self.dirpath, match.group(1) + '.sql')
self.close_define()
- self.process_file(open(filepath).read())
+ self.process_file(open(filepath).read(), prefix)
- def _sub_define(self, match):
+ def _sub_define(self, match, prefix):
define = match.group(1)
self.output.write(' APR_STRINGIFY(%s) \\\n' % define)
@@ -97,7 +97,7 @@ class Processor(object):
self.re_define : self._sub_define,
}
- def process_file(self, input):
+ def process_file(self, input, prefix):
input = self.re_comments.sub('', input)
for line in input.split('\n'):
@@ -109,7 +109,7 @@ class Processor(object):
for regex, handler in self._directives.iteritems():
match = regex.match(line)
if match:
- handler(match)
+ handler(match, prefix)
handled = True
break
@@ -141,6 +141,8 @@ def main(input_filepath, output):
var_name = re.sub('[-.]', '_', filename).upper()
+ prefix = re.sub('[^A-Z]','', var_name)
+
output.write(
'/* This file is automatically generated from %s.\n'
' * Do not edit this file -- edit the source and rerun gen-make.py */\n'
@@ -148,7 +150,7 @@ def main(input_filepath, output):
% (filename,))
proc = Processor(os.path.dirname(input_filepath), output, var_name)
- proc.process_file(input)
+ proc.process_file(input, prefix)
### the STMT_%d naming precludes *multiple* transform_sql headers from
### being used within the same .c file. for now, that's more than fine.
@@ -159,7 +161,8 @@ def main(input_filepath, output):
output.write(
'#define %s_DECLARE_STATEMENTS(varname) \\\n' % (var_name,)
+ ' static const char * const varname[] = { \\\n'
- + ', \\\n'.join(' STMT_%d' % (i,) for i in range(proc.stmt_count))
+ + ', \\\n'.join(' STMT_%s_%d' % (prefix, i,)
+ for i in range(proc.stmt_count))
+ ', \\\n NULL \\\n }\n')