Re: [sqlite] Fastest way to SELECT on a set of keys?
> > Op 17 sep. 2019, om 04:26 heeft Keith Medcalf het > volgende geschreven: > > > On Monday, 16 September, 2019 14:22, E.Pasma wrote: > >> Stop stop stop > > You are right. What a difference a spelling error makes ... No wonder it > took so long as it was doing table scans -- and the optimizer was doing a > jolly job in the other cases in dealing with it. > > Note that the sqlite3 wrapper cannot do .executemany() with SELECT statements > ... but it will do them with INSERT statements. Nevertheless, the results > are reasonably similar to these obtained with APSW ... > ... >> st 1000 > Creating db and sample keys: 100 rows; 1000 keys > Method 1: Individual Row 00:00:00.019247 > Method 2: Individual Row (Sorted) 00:00:00.017748 > Method 3: Rows with ExecMany 00:00:00.016084 > Method 3: Rows with ExecMany Sort 00:00:00.015766 > Method 4: Using IN temp 00:00:00.007528 > Method 5: Using IN temp (sorted) 00:00:00.007821 > Method 6: Using IN temp no rowid 00:00:00.007600 > Method 7: Using IN (dynamic) 00:00:00.005317 > Method 8: Using IN (sorted) 00:00:00.004884 > Method 9: Using IN CArray 00:00:00.005081 > Method A: Using IN CArray sorted 00:00:00.005190 .. > Using this code: > > #! python3 > > import apsw > import datetime > import random > import sqlite3 > import sys > import time > > datasize = 100 > rows = int(sys.argv[1]) > > elapsed = lambda st, et: datetime.datetime.utcfromtimestamp((et - st)).time() > tuplize = lambda x: (x,) > > db = apsw.Connection(':memory:') > #db = sqlite3.connect(':memory:', isolation_level=None) > > print('Creating db and sample keys:', end=' ', flush=True) > db.executescript(''' > create table x > ( >id integer primary key, >datablob > ); > insert into x > with a(x) as ( >select 1 > union all >select x + 1 > from a > where x < %d > ) > select x, randomblob(30) > from a; > analyze; > ''' % (datasize,)) > print(db.execute('select count(*) from x').fetchone()[0], 'rows;', end=' ') > > rowset = [i for i in range(datasize)] > random.shuffle(rowset) > rowset = rowset[:rows] > print(len(rowset), 'keys') > > print('Method 1: Individual Row ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for key in rowset: >row = db.execute('select * from x where id=?', (key,)).fetchone() > db.commit() > print(elapsed(st, time.time())) > > print('Method 2: Individual Row (Sorted)', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for key in sorted(rowset): >row = db.execute('select * from x where id=?', (key,)).fetchone() > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: Rows with ExecMany ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for row in db.executemany('select * from x where id=?', list(map(tuplize, > rowset))): >pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 3: Rows with ExecMany Sort', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > for row in db.executemany('select * from x where id=?', list(map(tuplize, > sorted(rowset: >pass > db.commit() > print(elapsed(st, time.time())) > > print('Method 4: Using IN temp ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key)') > db.executemany('insert into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 5: Using IN temp (sorted) ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key)') > db.executemany('insert into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 6: Using IN temp no rowid ', end=' ', flush=True) > st = time.time() > db.executescript('BEGIN') > db.executescript('create temporary table keys (key primary key) without > rowid') > db.executemany('insert or ignore into keys values (?)', list(map(tuplize, > sorted(rowset > for row in db.execute('select * from x where id in temp.keys'): >pass > db.executescript('drop table temp.keys') > db.commit() > print(elapsed(st, time.time())) > > print('Method 7: Using IN (dynamic) ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + ','.join(map(str, > rowset)) + ')'): >pass > print(elapsed(st, time.time())) > > print('Method 8: Using IN (sorted) ', end=' ', flush=True) > st = time.time() > for row in db.execute('select * from x where id in (' + ','.join(map(str, > sorted(rowset))) + ')'): >
[sqlite] generate_series can theoretically behave differently in SQLite 3.23.0
Hello, below is a case where the generate_series virtual table behaves differently with SQLite 3.23. It is a purely artificial case. Does anyoone have an opinion about it? The issue comes from the fact that generate_series does not check if a supposed integer input value happens to be NULL. And continues as if zero is passed in. This way a WHERE condition like = can be true. Regular tables require "IS" instead of "=" here. Theoretically the left join reduction prover can now be misleaded. As is the case below, Is this a (obscure) bug in generate_series? E Pasma .version SQLite 3.23.0 2018-03-24 13:24:02 cf171abe954a5f25262161dd69f2e8cecdbf9446c3f6b298201507dbc743567e zlib version 1.2.3 gcc-4.0.1 (Apple Inc. build 5465) .load series create table t1(x); create table t2(x,y); insert into t1 values(1),(2); insert into t2 values(1,4); select t1.x, value from t1 left join t2 using (x) join generate_series on start=t2.y and stop=5; 1|4 1|5 Same script in earlier version with echo off. .version SQLite 3.22.0 2018-01-09 15:28:25 6b3f4349d7156501f05270d494d6002ee03a0860b6e77b46d17effcab027b753 .echo off 1|4 1|5 2|0 2|1 2|2 2|3 2|4 2|5 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The upcoming 3.23.0 release
Dear Richard, the rather complicated script below used to print a tree: node1|node2|node3|node4|node5 1 1|11||| 1|12||| 1|12|121|| 1|13||| and with yesterday's pre-release snapshot it does not yield any result. A clue is this depends on the size of the query, not so much on the logic. If you reduce the number of anticipated levels, by leaving out the last four lines, the output is alright. Hope this causes no headache, E. Pasma .version SQLite 3.23.0 2018-03-22 12:00:43 dd568c27b1d7656388ea5b4132cc0265aedd7348d265d8e8c7412b00b28a31aa zlib version 1.2.3 gcc-4.0.1 (Apple Inc. build 5465) create table node (node integer primary key, parent integer) ; insert into node values (1,0),(11,1),(12,1),(13,1),(121,12) ; create index node_parent on node (parent) ; create table bit (bit integer primary key) ; insert into bit values(0),(1) ; SELECT node1, node2, node3, node4, node5 /* level 1 is root */ FROM(SELECT node AS node1 FROM node WHERE parent=0) /* level 2 */ JOIN(SELECT bit AS bit1 FROM bit) LEFT JOIN (SELECT node AS node2, parent AS parent2 FROM node) ON bit1 AND parent2=node1 /* level 3 */ JOIN(SELECT bit AS bit2 FROM bit) ON bit2<=CASE WHEN bit1 THEN CASE WHEN node2 THEN 1 ELSE -1 END ELSE 0 END LEFT JOIN (SELECT node AS node3, parent AS parent3 FROM node) ON bit2 AND parent3=node2 /* level 4 */ JOIN(SELECT bit AS bit3 FROM bit) ON bit3<=CASE WHEN bit2 THEN CASE WHEN node3 THEN 1 ELSE -1 END ELSE 0 END LEFT JOIN (SELECT node AS node4, parent AS parent4 FROM node) ON bit3 AND parent4=node3 /* level 5 */ JOIN(SELECT bit AS bit4 FROM bit) ON bit4<=CASE WHEN bit3 THEN CASE WHEN node4 THEN 1 ELSE -1 END ELSE 0 END LEFT JOIN (SELECT node AS node5, parent AS parent5 FROM node) ON bit4 AND parent5=node4 ; ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] View is not flattened when inside an IN sub-query (incl complete sql)
Hello, below are two equivalent delete statements. The difference is that the second version uses views, actually sub-queries, on the base tables. These are simple one to one views that could be flattened out, as in http://www.sqlite.org/optoverview.html#flattening The second query plan has a full scan on t2 instead of direct access via the existing index sqlite_autoindex_t2_1. The view is apparently not flattened. Is it possible changing this? Thanks, E. Pasma .version SQLite 3.21.0 2017-10-02 02:52:54 c9104b59c7ed360291f7f6fc8caae938e9840c77620d598e4096f78183bf807a create table t1 (a, b, primary key(a,b)); create table t2 (b, c, primary key(b,c)); create table t3 (a, c, primary key(a,c)); .eqp on delete from t3 where (a,c) in ( select a,c fromt1 joint2 using(b) ); --EQP-- 0,0,0,SEARCH TABLE t3 USING INDEX sqlite_autoindex_t3_1 (a=? AND c=?) --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0 --EQP-- 0,0,0,SCAN TABLE t1 --EQP-- 0,1,1,SEARCH TABLE t2 USING COVERING INDEX sqlite_autoindex_t2_1 (b=?) delete from t3 where (a,c) in ( select a,c from(select a,b from t1) join(select b,c from t2) using(b) ); --EQP-- 0,0,0,SEARCH TABLE t3 USING INDEX sqlite_autoindex_t3_1 (a=? AND c=?) --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0 --EQP-- 1,0,0,SCAN TABLE t1 --EQP-- 2,0,0,SCAN TABLE t2 --EQP-- 0,0,0,SCAN SUBQUERY 1 --EQP-- 0,1,1,SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX (b=?) ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow query, with correlated sub-sub-query
Keith, this definitely explains the observed time as it is relative to count(a)*count (ab)**2, thus non-linear. And a correlated sub-query is generally recalculated for each row. But I do not agree with everything. In my example it is correlated to the outermost query, and not to the sub-query in which it occurs. Theoretically the optimizer can take this into account and only recalculate for each row in the outermost query. And if I'm not mistaken Postgress does so. Below is a version modified for pgsql that runs fast no matter the number of rows. Thanks for the suggested change, where the minimum size is computed is a sub-query (not sub-sub) and joined to the other sub-query. This is so elegant. I still need to compare the timing to David's version and use the fastest. /* sudo -u postgres psql < issue2p.sql */ drop table if exists a ; drop table if exists ab ; create table a (a int, primary key (a)) ; create table ab (a int, b int, size int, primary key (a,b)) ; insert into a with recursive i as (select 1 as i union all select i+1 from i where i<3) select i from i ; insert into ab with recursive i as (select 1 as i union all select i+1 from i where i<1) select a, i as b, (a+i)%10 as size from a, i ; select a, ( select count(*) fromab where a=a.a and size=(select min(size) from ab where a=a.a) ) froma ; Keith Medcalf wrote: Well of course. You are aware that a correlated subquery means "for each candidate result execute the query"? So as you have formulated the query it means: for each row in a compute the result count which for each ab candidate row calculate whether it is the minimum which means that the you have requested that the same result be computed many times over. You have requested exampination of count(a) * count(ab) * count(ab) rows. Instead you should be computing the min(size) for each group of a once, and using that value in the correlated subquery select a.a, ( select count(*) from ab where a == a.a and size == b.size ) as acount from a, ( select a, min(size) as size from ab group by a ) as b where a.a == b.a; This will result in scanning count(ab) + count(a) * count(ab) rows. Which is significantly less. On my computer it reduces the execution time of the original query you posited from 400 ticks to less than 1 tick (ie, from 500 ms to <8 ms) I do not know if any optimizer can flatten you original query to any significant degree. Some optimizers may arrive at my fixed up query because they are capable of doing a hash table lookup on the result of the innermost correlate. SQLite does not do that, and without that capability I do not think there is a relational database query optimizer on the planet that can help you. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] obscure bug: duplicate rowid after insert or replace
Hello, I have a duplicate rowid in a 3.16.2 database and this is essentially produced as follows: CREATE TABLE t(i INTEGER PRIMARY KEY, a TEXT) ; INSERT INTO t VALUES (NULL, 'generates row 1') ; REPLACE INTO t VALUES (NULL, 'generates row 2'), (1, 'replaces row 1') ; It is alright after changing the order in the multiple values. Hope I'm not mistaken.Thanks, E. Pasma ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] group_replace
11 aug 2016, om 13:10, Anthony Lansbergen: Hello, I needed a way to make parameterized texts for logging in my current project. For this purpose I missed a function in sqlite: group_replace, so I made an extension. The function is just like group_concat, an aggregate function, but it replaces keys with values in a string instead of concattenating strings. I put the code online, it's publicly available at: https://github.com/adesys/sqlite3_group_replace_extension It seems to work fine, but since this is my first extension, can someone please take a look at it and check if it is bug free :-) thanks in advance, Anthony Lansbergen Hello, I am not very experienced. I do have an easy test to check memory leak. It runs on OS X. For group_replace i tested the script below and that shows increasing memory use. Two remarks/questions: 1. better use sqlite3_malloc and sqlite3_free instead of the C primitives. 2. is it imaginable to obtain the same functionality without a C extension, by using plain replace() inside a recursive CTE? Thanks, E. Pasma .load ./sqlite3_group_replace_extension create table kv (k, v, reverse); insert into kv values ('1', 'one',0),('2','two',0),('3','knock knock knock',0); insert into kv select v,k,1 from kv; create view v as with r as ( select 0 as i, '1 2 3 4 5 6 7 8 9' as s union all select i+1, (select group_replace (s, k, v) from kv where reverse=i%2) from r where i<10 ) select * from r where i>=10-1 ; select * from v; .system ps -o vsz -o command|grep sqlite3$ select * from v; .system ps -o vsz -o command|grep sqlite3$ select * from v; .system ps -o vsz -o command|grep sqlite3$ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] special aggregate feature vs query optimization
Hello, this mail is about the aggregate feature that was unveiled in the release log of version 3.17.11, http://www.sqlite.org/releaselog/3_7_11.html : A query of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value. I just want to point to a construction where one would expect this to work however it does not. I tried a query that returns only the value of y and intuitively wrote: select y from (select max(x), y from t); This however no longer returns the value of y corresponding to the maximum x. It looks a consequence of query optimization. The query satisfies all conditions for "Query Flattening" in http://www.sqlite.org/optoverview.html . The max(x) column is then eliminated. Hope this is useful to know for who is using the feature. EPasma ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite bug ticket - build fails on sun4-solaris-64int 2.10
Op 2-jan-2010, om 9:32 heeft Roger Binns het volgende geschreven: > Darren Duncan wrote: >> I would like to bring an apparent SQLite bug to the attention of >> the SQLite core >> developers as a ticket, where build fails on sun4-solaris-64int 2.10 > > You'll find this is not a bug in SQLite. > >> cc: Fatal error in /opt/sunstudio12.1/prod/bin/cg >> cc: Status 139 >> *** Error code 139 > > That is the compiler crashing (signal 11, SIGSEGV). This sort of > thing > usually turns out to be an optimiser bug and likely won't happen if > you > disable optimisation, or compile the files individually rather than > using the amalgamation. Alternatively use a working compiler like > gcc. > > Incidentally three of your defines are dodgy: > > -DSQLITE_CORE > > There is never any need to specify this - all that stuff is handled > internally. > > -DSQLITE_PTR_SZ=4 > > That name is not used anywhere in the SQLite source I could find. > Even > if it was, implying 4 byte pointers on a 64 bit machine seems > dangerous. > > -DTHREADSAFE=0 > > Really? What is wrong (and less likely to cause the unwary grief) > than > the default of 1? > > The other flags seem to be guessed. There is no need to tell a 64 bit > system that file offets are 64 bits. The only 'have' is > HAVE_USLEEP but > the system likely has LOCALTIME_R and GMTIME_R too as well as several > other header files. > > If you do not want to build SQLite using its build system then the > approach I take is to run SQLite's configure, grab the DEFS = line out > of the resulting Makefile and generate a .h file with the relevant -D > turned into #defines. If you define _HAVE_SQLITE_CONFIG_H then SQLite > will #include any config.h so you can dump your #defines in there. > > Roger Hello, just one remark with regards to the thread-safe option: the default setting has changed in the configure script of version 3.6.21. It has now become 0 and the --enable-theadsafe argument is required to configure for threadsafety (strongly recommended for shared cache mode). Regads, Edzard Pasma ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users