Re: [sqlite] Min/Max and skip-scan optimizations

2019-01-27 Thread Keith Medcalf

Do you perhaps want this:

select source1, 
   source2, 
   (
select min(ts) 
  from rolling 
 where source1 = x.source1 
   and source2 = x.source2
   ) 
  from (
select distinct source1, 
source2 
  from rolling
   ) as x;

SQLite version 3.27.0 2019-01-28 00:42:06
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .timer on
sqlite> CREATE TABLE `rolling` (
   ...> `source1`TEXT NOT NULL,
   ...> `source2`TEXT NOT NULL,
   ...> `ts`INTEGER NOT NULL,
   ...> `value`TEXT
   ...> );
Run Time: real 0.002 user 0.00 sys 0.00
sqlite>
sqlite> CREATE INDEX `sources` ON `rolling` (
   ...> `source1`,
   ...> `source2`,
   ...> `ts`
   ...> );
Run Time: real 0.001 user 0.00 sys 0.00
sqlite>
sqlite> INSERT INTO rolling
   ...> WITH RECURSIVE
   ...>   src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
   ...>   src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
   ...> VALUES("X3") UNION ALL VALUES("X4") ),
   ...>   cnt( ts, value) AS (
   ...>   VALUES( 0, "")
   ...> UNION ALL
   ...>   SELECT ts+1, value FROM cnt LIMIT 100)
   ...>
   ...> select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
Run Time: real 8.920 user 8.843750 sys 0.078125
sqlite>
sqlite> analyze;
Run Time: real 1.285 user 1.281250 sys 0.00
sqlite> .eqp full
sqlite>
sqlite> select source1,
   ...>source2,
   ...>(
   ...> select min(ts)
   ...>   from rolling
   ...>  where source1 = x.source1
   ...>and source2 = x.source2
   ...>)
   ...>   from (
   ...> select distinct source1,
   ...> source2
   ...>   from rolling
   ...>) as x;
QUERY PLAN
|--CO-ROUTINE 2
|  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
|--SCAN SUBQUERY 2 AS x (~7864320 rows)
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND 
source2=?) (~983040 rows)
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 64000  Start at 64
1 InitCoroutine  1 23200  x
2 Null   1 4 008  r[4]=NULL
3 OpenRead   4 3 0 k(4)   00  root=3 iDb=0; sources
4 ColumnsUsed4 0 0 3  00
5 Explain5 0 0 SCAN TABLE rolling USING COVERING INDEX 
sources (~7864320 rows)  00
6 Noop   0 0 000  Begin WHERE-loop0: 
rolling
7 Rewind 4 212 0  00
8 Noop   0 0 000  Begin WHERE-core
9 Column 4 0 200  
r[2]=rolling.source1
10Column 4 1 300  
r[3]=rolling.source2
11Ne 2 134 (BINARY)   80  if r[4]!=r[2] 
goto 13
12Eq 3 205 (BINARY)   80  if r[5]==r[3] 
goto 20
13Copy   2 4 100  r[4..5]=r[2..3]
14Yield  1 0 000
15Noop   0 0 000  End WHERE-core
16Column 4 0 600  r[6]=
17Column 4 1 700  r[7]=
18SeekGT 4 216 2  00  key=r[6..7]
19  Goto   1 8 000
20Next   4 8 001
21Noop   0 0 000  End WHERE-loop0: 
rolling
22EndCoroutine   1 0 000
23Explain230 0 SCAN SUBQUERY 2 AS x (~7864320 rows)  00
24Noop   0 0 000  Begin WHERE-loop0: x
25InitCoroutine  1 0 200
26  Yield  1 62000  next row of x
27  Noop   0 0 000  Begin WHERE-core
28  Copy   2 9 000  r[9]=r[2]; x.source1
29  Copy   3 10000  r[10]=r[3]; 
x.source2
30  Null   0 1212   00  r[12..12]=NULL; 
Init subquery result
31  Integer1 13000  r[13]=1; LIMIT 
counter
32  Null   0 1415   00  r[14..15]=NULL
33  OpenRead   5 3 0 k(4)   00  root=3 iDb=0; 
sources
34  ColumnsUsed5 0 0 7  00
35  Explain35 

[sqlite] Min/Max and skip-scan optimizations

2019-01-27 Thread Gerlando Falauto
Hi,

I have a database table looking like this:

CREATE TABLE `rolling` (
`source1`TEXT NOT NULL,
`source2`TEXT NOT NULL,
`ts`INTEGER NOT NULL,
`value`TEXT
);

CREATE INDEX `sources` ON `rolling` (
`source1`,
`source2`,
`ts`
);

INSERT INTO rolling
WITH RECURSIVE
  src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
  src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
VALUES("X3") UNION ALL VALUES("X4") ),
  cnt( ts, value) AS (
  VALUES( 0, "")
UNION ALL
  SELECT ts+1, value FROM cnt LIMIT 100)

select src1.source1, src2.source2, cnt.* from src1, src2, cnt;


So the table looks like the following

aaa|X1|0|
aaa|X1|1|

aaa|X1|98|
aaa|X1|99|
aaa|X2|0|
aaa|X2|1|
...
aaa|X2|98|
aaa|X2|99|
...
aaa|X4|99|
bbb|X1|0|
bbb|X1|1|
bbb|X1|2|
...
bbb|X4|99|

So we have 2*4*1M = 8M rows, all indexed. The first two text columns
(source1, source2) have a very limited set of possible values, while the
third numeric column is a timestamp.
So it's essentially a partitioned logbook, with a very small number of
partitions.

Given this particular structure, I would expect indexing to be extremely
helpful, thanks to covering indexes, min/max and skip-scan optimizations.

Below as some example queries that I would expect to be taking advantage of
indexing.
They're however sometimes not, so I'd like some help understanding if I'm
getting something wrong.

1)  SELECT DISTINCT source1, source2 FROM rolling;

Provided the table has been analyzed, this will run real fast.
Here I'm essentially try to get a recap of all distinct values for the
first two columns of an index.
So ideally I would just scan the covering index, skipping to the next
values for the first two columns.
I don't really understand why analyze is a precondition for this
optimization to kick in, however I'm pretty happy about this.

SUGGESTION: Perhaps this could also be added as an example in the
description of the skip-scan optimization?

2) SELECT * FROM rolling WHERE ts < 100;

Provided the table has been analyzed, this will run real fast as well.
I believe the skip-scan optimization is responsible for making this
instantaneous, since this looks like one of the canonical examples.

3) SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")

Again, this will be almost instantaneous, and works regardless of whether
the table has been analyzed or not.

So it looks like the min/max optimization:

 >> Queries that contain a single MIN() or MAX() aggregate function whose
argument is the left-most column of an index
 >> might be satisfied by doing a single index lookup rather than by
scanning the entire table.

Here the argument of the MIN() function is technically *not* the left-most
column of an index, though imposing a certain
value for the first two columns kind-of makes it the left-most one.

SUGGESTION: Perhaps this could also be added as an example in the
description of the min-max optimization?

QUESTION: this feature -- locking tha value for left-most columns of an
index, I believe it's WHERE CLAUSE analysis --
looks like a general feature does it apply to other cases as well?

4) SELECT MIN(ts), MAX(ts) FROM rolling WHERE (source1,source2)=("aaa",
"X2")

This will *NOT* be optimized. I understand by reading from previous
messages in this group that this is a
known limitation of the Min/Max optimization. There are however known
workarounds for it, like:

SELECT (SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")),
(SELECT MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2"));

SUGGESTION: Perhaps this could be added in the documentation?

5) SELECT source1,source2,MIN(ts) FROM rolling GROUP BY source1, source2

Here I would expect the query to be optimized somehow by taking advantage
of both skip-scan and min/max optimizations,
but I could not manage to find a way.

This could be easily done in the application code by just nesting two
separate queries:

SELECT DISTINCT source1, source2 FROM rolling; -- This will return the set
of sources available and will be quite fast

SELECT MIN(ts) FROM rolling WHERE (source1,source2)=(?, ?) -- This will
manually filter by the results returned by the previous one

I tried everything that came to mind to run this as a single query,
including ORDER BY, INDEXED BY, subqueries, joins, window functions,
but everything just seemed to make things worse.

QUESTION: Is there any way to force the query engine to take advantage of
both optimizations?

Thanks in advance!
Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Support for System.Data.SQLite: Different API type for int/integer columns

2019-01-27 Thread Joe Mistachkin

JP wrote:
>
> The answers don't explain why there is a bitness difference at run-time
> between the types retrieved from INT and INTEGER columns, and that's my
> question. From reading https://sqlite.org/datatype3.html I understand
> there should be no difference whatsoever between defining a column INT
> or INTEGER (other than whether a primary key may become a rowid alias). 
>

System.Data.SQLite maps type names to their associated types.  Mappings
can be overridden on a per-connection basis.  By default, INT maps to a
32-bit integer and INTEGER maps to a 64-bit integer, mostly for reasons
of backward compatibility.

In order to customize the mappings, the flag UseConnectionTypes should
be used when opening the connection -AND- the AddTypeMapping method is
used to override a built-in mapping of the same name (e.g. "INT").

--
Joe Mistachkin

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Support for System.Data.SQLite: Different API type for int/integer columns

2019-01-27 Thread Simon Slavin
On 24 Jan 2019, at 3:27pm, JP  wrote:

> I have encountered the same problem as this: 
> https://stackoverflow.com/q/4925084

What is returned by typeof(reader["myColumn"]) in each case ?

I suspect that you're getting an integer when the column is declared as 
INTEGER, but not when the column is declared as INT.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Support for System.Data.SQLite: Different API type for int/integer columns

2019-01-27 Thread JP
Hello,

I have encountered the same problem as this: https://stackoverflow.com/q/4925084

The answers don't explain why there is a bitness difference at run-time between 
the types retrieved from INT and INTEGER columns, and that's my question. From 
reading https://sqlite.org/datatype3.html I understand there should be no 
difference whatsoever between defining a column INT or INTEGER (other than 
whether a primary key may become a rowid alias).

I don't mean the bitness how the integers are stored in the disk database, but 
the values returned by the System.Data.SQLite API (in particular via a 
DataTable loaded by a SQLiteDataReader).

I have verified that declaring a column (which isn't any kind of key) as INT 
causes System.Data.SQLite to return Int32/int (possibly, depending on the 
value? Not sure); and declaring INTEGER causes the same value to be returned as 
Int64/long.

Can anyone explain this, or point to where this is actually documented, if I've 
missed it?

It's the first time I use this mailing list, after searching for the answer on 
sqlite.org and around the Web; I hope I haven't missed any RTFM, otherwise 
please let me know.

Thanks
Xavier Porras
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users