Re: [sqlite] Min/Max and skip-scan optimizations
Do you perhaps want this: select source1, source2, ( select min(ts) from rolling where source1 = x.source1 and source2 = x.source2 ) from ( select distinct source1, source2 from rolling ) as x; SQLite version 3.27.0 2019-01-28 00:42:06 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> .timer on sqlite> CREATE TABLE `rolling` ( ...> `source1`TEXT NOT NULL, ...> `source2`TEXT NOT NULL, ...> `ts`INTEGER NOT NULL, ...> `value`TEXT ...> ); Run Time: real 0.002 user 0.00 sys 0.00 sqlite> sqlite> CREATE INDEX `sources` ON `rolling` ( ...> `source1`, ...> `source2`, ...> `ts` ...> ); Run Time: real 0.001 user 0.00 sys 0.00 sqlite> sqlite> INSERT INTO rolling ...> WITH RECURSIVE ...> src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ), ...> src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL ...> VALUES("X3") UNION ALL VALUES("X4") ), ...> cnt( ts, value) AS ( ...> VALUES( 0, "") ...> UNION ALL ...> SELECT ts+1, value FROM cnt LIMIT 100) ...> ...> select src1.source1, src2.source2, cnt.* from src1, src2, cnt; Run Time: real 8.920 user 8.843750 sys 0.078125 sqlite> sqlite> analyze; Run Time: real 1.285 user 1.281250 sys 0.00 sqlite> .eqp full sqlite> sqlite> select source1, ...>source2, ...>( ...> select min(ts) ...> from rolling ...> where source1 = x.source1 ...>and source2 = x.source2 ...>) ...> from ( ...> select distinct source1, ...> source2 ...> from rolling ...>) as x; QUERY PLAN |--CO-ROUTINE 2 | `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows) |--SCAN SUBQUERY 2 AS x (~7864320 rows) `--CORRELATED SCALAR SUBQUERY 1 `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND source2=?) (~983040 rows) addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 64000 Start at 64 1 InitCoroutine 1 23200 x 2 Null 1 4 008 r[4]=NULL 3 OpenRead 4 3 0 k(4) 00 root=3 iDb=0; sources 4 ColumnsUsed4 0 0 3 00 5 Explain5 0 0 SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows) 00 6 Noop 0 0 000 Begin WHERE-loop0: rolling 7 Rewind 4 212 0 00 8 Noop 0 0 000 Begin WHERE-core 9 Column 4 0 200 r[2]=rolling.source1 10Column 4 1 300 r[3]=rolling.source2 11Ne 2 134 (BINARY) 80 if r[4]!=r[2] goto 13 12Eq 3 205 (BINARY) 80 if r[5]==r[3] goto 20 13Copy 2 4 100 r[4..5]=r[2..3] 14Yield 1 0 000 15Noop 0 0 000 End WHERE-core 16Column 4 0 600 r[6]= 17Column 4 1 700 r[7]= 18SeekGT 4 216 2 00 key=r[6..7] 19 Goto 1 8 000 20Next 4 8 001 21Noop 0 0 000 End WHERE-loop0: rolling 22EndCoroutine 1 0 000 23Explain230 0 SCAN SUBQUERY 2 AS x (~7864320 rows) 00 24Noop 0 0 000 Begin WHERE-loop0: x 25InitCoroutine 1 0 200 26 Yield 1 62000 next row of x 27 Noop 0 0 000 Begin WHERE-core 28 Copy 2 9 000 r[9]=r[2]; x.source1 29 Copy 3 10000 r[10]=r[3]; x.source2 30 Null 0 1212 00 r[12..12]=NULL; Init subquery result 31 Integer1 13000 r[13]=1; LIMIT counter 32 Null 0 1415 00 r[14..15]=NULL 33 OpenRead 5 3 0 k(4) 00 root=3 iDb=0; sources 34 ColumnsUsed5 0 0 7 00 35 Explain35
[sqlite] Min/Max and skip-scan optimizations
Hi, I have a database table looking like this: CREATE TABLE `rolling` ( `source1`TEXT NOT NULL, `source2`TEXT NOT NULL, `ts`INTEGER NOT NULL, `value`TEXT ); CREATE INDEX `sources` ON `rolling` ( `source1`, `source2`, `ts` ); INSERT INTO rolling WITH RECURSIVE src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ), src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL VALUES("X3") UNION ALL VALUES("X4") ), cnt( ts, value) AS ( VALUES( 0, "") UNION ALL SELECT ts+1, value FROM cnt LIMIT 100) select src1.source1, src2.source2, cnt.* from src1, src2, cnt; So the table looks like the following aaa|X1|0| aaa|X1|1| aaa|X1|98| aaa|X1|99| aaa|X2|0| aaa|X2|1| ... aaa|X2|98| aaa|X2|99| ... aaa|X4|99| bbb|X1|0| bbb|X1|1| bbb|X1|2| ... bbb|X4|99| So we have 2*4*1M = 8M rows, all indexed. The first two text columns (source1, source2) have a very limited set of possible values, while the third numeric column is a timestamp. So it's essentially a partitioned logbook, with a very small number of partitions. Given this particular structure, I would expect indexing to be extremely helpful, thanks to covering indexes, min/max and skip-scan optimizations. Below as some example queries that I would expect to be taking advantage of indexing. They're however sometimes not, so I'd like some help understanding if I'm getting something wrong. 1) SELECT DISTINCT source1, source2 FROM rolling; Provided the table has been analyzed, this will run real fast. Here I'm essentially try to get a recap of all distinct values for the first two columns of an index. So ideally I would just scan the covering index, skipping to the next values for the first two columns. I don't really understand why analyze is a precondition for this optimization to kick in, however I'm pretty happy about this. SUGGESTION: Perhaps this could also be added as an example in the description of the skip-scan optimization? 2) SELECT * FROM rolling WHERE ts < 100; Provided the table has been analyzed, this will run real fast as well. I believe the skip-scan optimization is responsible for making this instantaneous, since this looks like one of the canonical examples. 3) SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2") Again, this will be almost instantaneous, and works regardless of whether the table has been analyzed or not. So it looks like the min/max optimization: >> Queries that contain a single MIN() or MAX() aggregate function whose argument is the left-most column of an index >> might be satisfied by doing a single index lookup rather than by scanning the entire table. Here the argument of the MIN() function is technically *not* the left-most column of an index, though imposing a certain value for the first two columns kind-of makes it the left-most one. SUGGESTION: Perhaps this could also be added as an example in the description of the min-max optimization? QUESTION: this feature -- locking tha value for left-most columns of an index, I believe it's WHERE CLAUSE analysis -- looks like a general feature does it apply to other cases as well? 4) SELECT MIN(ts), MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2") This will *NOT* be optimized. I understand by reading from previous messages in this group that this is a known limitation of the Min/Max optimization. There are however known workarounds for it, like: SELECT (SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")), (SELECT MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")); SUGGESTION: Perhaps this could be added in the documentation? 5) SELECT source1,source2,MIN(ts) FROM rolling GROUP BY source1, source2 Here I would expect the query to be optimized somehow by taking advantage of both skip-scan and min/max optimizations, but I could not manage to find a way. This could be easily done in the application code by just nesting two separate queries: SELECT DISTINCT source1, source2 FROM rolling; -- This will return the set of sources available and will be quite fast SELECT MIN(ts) FROM rolling WHERE (source1,source2)=(?, ?) -- This will manually filter by the results returned by the previous one I tried everything that came to mind to run this as a single query, including ORDER BY, INDEXED BY, subqueries, joins, window functions, but everything just seemed to make things worse. QUESTION: Is there any way to force the query engine to take advantage of both optimizations? Thanks in advance! Gerlando ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Support for System.Data.SQLite: Different API type for int/integer columns
JP wrote: > > The answers don't explain why there is a bitness difference at run-time > between the types retrieved from INT and INTEGER columns, and that's my > question. From reading https://sqlite.org/datatype3.html I understand > there should be no difference whatsoever between defining a column INT > or INTEGER (other than whether a primary key may become a rowid alias). > System.Data.SQLite maps type names to their associated types. Mappings can be overridden on a per-connection basis. By default, INT maps to a 32-bit integer and INTEGER maps to a 64-bit integer, mostly for reasons of backward compatibility. In order to customize the mappings, the flag UseConnectionTypes should be used when opening the connection -AND- the AddTypeMapping method is used to override a built-in mapping of the same name (e.g. "INT"). -- Joe Mistachkin ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Support for System.Data.SQLite: Different API type for int/integer columns
On 24 Jan 2019, at 3:27pm, JP wrote: > I have encountered the same problem as this: > https://stackoverflow.com/q/4925084 What is returned by typeof(reader["myColumn"]) in each case ? I suspect that you're getting an integer when the column is declared as INTEGER, but not when the column is declared as INT. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Support for System.Data.SQLite: Different API type for int/integer columns
Hello, I have encountered the same problem as this: https://stackoverflow.com/q/4925084 The answers don't explain why there is a bitness difference at run-time between the types retrieved from INT and INTEGER columns, and that's my question. From reading https://sqlite.org/datatype3.html I understand there should be no difference whatsoever between defining a column INT or INTEGER (other than whether a primary key may become a rowid alias). I don't mean the bitness how the integers are stored in the disk database, but the values returned by the System.Data.SQLite API (in particular via a DataTable loaded by a SQLiteDataReader). I have verified that declaring a column (which isn't any kind of key) as INT causes System.Data.SQLite to return Int32/int (possibly, depending on the value? Not sure); and declaring INTEGER causes the same value to be returned as Int64/long. Can anyone explain this, or point to where this is actually documented, if I've missed it? It's the first time I use this mailing list, after searching for the answer on sqlite.org and around the Web; I hope I haven't missed any RTFM, otherwise please let me know. Thanks Xavier Porras ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users