[sqlite] Bug Report: "Is Null" where term propagated to a virtual table causes wrong query results

2018-09-07 Thread Josef Kučera

Hello,
I have found an obscure bug in the virtual table processing.

The schema looks like this:
CREATE TABLE t1 (id int, value text);
INSERT INTO t1 VALUES(1,'try');
CREATE TABLE t2 (ctx int, id int, value text);
INSERT INTO t2 VALUES(1,1,'good');
INSERT INTO t2 VALUES(2,2,'evil');

The following query works ok if t1 and t2 are real:
sqlite> select * from t2 left join t1 on t1.id=t2.ctx where t1.value is 
null;

2|2|evil||

If they are virtual (and the module implements ISNULL processing) the 
result is wrong:
sqlite> select * from t2 left join t1 on t1.id=t2.ctx where t1.value is 
null;

1|1|good||
2|2|evil||

Looking at WhereTrace I can see the ISNULL term applied to t1 without 
the L flag. The virtual module correctly returns 0 records for t1 
search, but t2 records are returned anyway.


As a workaround I can use a unary operator, effective blocking the 
ISNULL term processing in virtual module:
sqlite> select * from t2 left join t1 on t1.id=t2.ctx where +t1.value is 
null;

2|2|evil||

Best regards,
Joe
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Left Outer Join of a Table-valued Function

2017-04-07 Thread Josef Kučera
Hello,
I have discovered a somewhat strange behavior. I utilize the table-valued 
functionality to inject generated data into SQL queries. Imagine a query like 
this:

SELECT * FROM mData AS D
LEFT OUTER JOIN gLink(mData.Param1) AS L ON L.ID=D.ID

The virual table-valued gLink processes only the input parameter, all the other 
constraints are returned with argvIndex=0, omit=0 for SQLite to handle. The 
mData.Param1 seems to behave like an INNER JOIN constraint and limits the query 
output. Is that correct?

I have discovered a workaround:

SELECT * FROM mData AS D
LEFT OUTER JOIN (SELECT * FROM gLink(mData.Param1) AS dummy) AS L ON L.ID=D.ID

This query returns the expected data.

Best regards
Joe
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Flattener vs. result-set column names

2016-07-07 Thread Josef Kučera


- Original Message - 
From: "Simon Slavin" 

To: "SQLite mailing list" 
Sent: Thursday, July 07, 2016 5:11 PM
Subject: Re: [sqlite] Query Flattener vs. result-set column names




On 7 Jul 2016, at 3:37pm, Josef Kučera  wrote:

Imagine a query like "SELECT A.F1, B1.F FROM A JOIN (SELECT F2 AS F FROM 
B WHERE F3=0) B1". If the query flattener is active the result-set has 
columns "A.F1" and "B1.F". If it is disabled the result-set columns are 
"F1" and "F". The "short_column_names" option is set.


I thought the optimization used, would not change the result-set column 
names. Was I wrong?


Yes.  There is no documentation for result-set column names unless you 
specify them using "AS".  They can be anything, and they can change from 
one version of SQLite to another.  Never rely on them unless you're using 
"AS".


Yeah, I understand the column names are not defined, but I expected them to 
be static for a specific SQLite version, SQL command and database. This is a 
wrong assumption then?


Also, your natural JOIN (sometimes called ANSI JOIN) is a little dangerous 
since it can collapse if you change column names or definitions.  People 
have been complaining about the problem for over a decade.  An example is 
when a program stops working because someone added 'info' columns to both 
tables and the JOIN command is now looking only for cases where they 
match.


It's better to define the connection between the two tables explicitly.  I 
would hope to see something more like


SELECT A.F1, B.F FROM A JOIN B ON B.x = A.y

This lets you define column names all in one place more neatly:

SELECT A.F1 AS F1, B.F AS F1 FROM A JOIN B ON B.x = A.y

rather than having to put some "AS" at the beginning and other "AS" in the 
sub-select.


Thank you for the detailed explanation. The natural JOIN was used only to 
simplify the sample command, I always explicitly declare the columns used 
for the JOIN operation for the same reasons you mentioned.



Simon.


Joe

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query Flattener vs. result-set column names

2016-07-07 Thread Josef Kučera
Hello,
today I have discovered a strange side-effect of a query flattener.

Imagine a query like "SELECT A.F1, B1.F FROM A JOIN (SELECT F2 AS F FROM B 
WHERE F3=0) B1". If the query flattener is active the result-set has columns 
"A.F1" and "B1.F". If it is disabled the result-set columns are "F1" and "F". 
The "short_column_names" option is set.

I thought the optimization used, would not change the result-set column names. 
Was I wrong?

Best regards
Joe

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Outer Join with Where terms - possible optimization

2016-04-29 Thread Josef Kučera
Hello,
I have discovered a possible query planner improvement. I am using the SQLite 
with virtual table modules.

Situation 1: Lets have a query like 'SELECT V.* FROM (select A.RowID A_ROWID, 
B.RowID B_ROWID, A.*, B.* from A left outer join B on B.ref=A.key) V WHERE 
V.B_ROWID=?', where A and B are virtual tables. If I am not mistaken the WHERE 
term effectively turns the outer join to an inner join and the query could be 
evaulated as two fast key searches. Unfortunately, currently the join is left 
as is and the execution peforms a sequentail full-scan of the A table with many 
key searches to the B table.

Situation 2: How does SQLite evaluate tables to the left of a left join? As it 
seems it is done by order in the SQL. So query like this: select * from A join 
B on B.ref=A.key left outer join C on C.ref=A.key where C.Value=? causes a full 
scan of A and B before even evaulating C. By looking at the query the B table 
does not seem to be a pre-requisite of table C, although is it marked as one in 
where trace.

Is this correct?

Thanks.
Joe


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Josef Kučera
- Original Message - 
From: "Hick Gunter" 

To: "'General Discussion of SQLite Database'" 
Sent: Monday, December 15, 2014 2:40 PM
Subject: Re: [sqlite] Query Planner for Virtual Tables: link table 
evaluation & transitive property of constraints not used



I would concur in that SQLite is asking "which subset of the given 
constraints yields the most efficient access".


The possible query plans are

1) A() -> B(ID) -> C(LINKID)

2) C() -> B(LINKID) -> A(ID)

3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID)

4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID)

Assuming unique keys in A and C and cardinalities of a, b and c we have 
estimated costs (in # of records retrieved):


1) a + a*b/a + a*b/a*1 = a + 2b

2) c + c*b/c + c*b/c*1 = c + 2b

3) b + b*1 + b*1 = 3b

4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b)

So which is the smallest cost?

We know that b <= a*c, which makes query plan 4 at least as expensive as 
plans 1 or 2 respectively.


Choosing between plans 1 and 2 means starting with the smaller of the two 
tables (assume a < c).


So how do plans 1 and 3 compare? Plan 3 is better only for very sparse 
link tables where b < a < c is true.




The reasoning is absolutely correct.

I forgot to mention, that most often the query has a where constraint 
(generated at runtime) limiting A or C to a small subset. In that scenario 
the smallest cost would be 1) or 2) respectively.


There is one more problem with plan 4): Assuming A and/or B and/or C has 
many records (100.000s) the query B(ID,LINKID) is executed for each and 
every record in A*C - if there is an constant cost for each B(ID,LINKID) 
execution then 4) is the most expensive option.


Joe 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Josef Kučera
On 12/15/2014 13:23 PM, Dan Kennedy wrote:
> On 12/12/2014 09:22 PM, Josef Kučera wrote:
> > Hello,
> > I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL
> > layer for querying an in memory storage. This works good, but I have a
> > problem with more complex queries. When querying a real SQLite database it
> > correctly moves the constant conditions across joined tables to optimize
> > the execution plan (I think this was implemented in the 3.7.17 release).
> > Unfortunately for virtual tables this does not seem to be supported. I can
> > overcome this limitation by manually tuning the SQL, but it will help if
> > the query planner can do this automatically.
> >
> > The major problem I have is with link table evaluation. Imagine a SQL like
> > "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current
> > implementation evaluates cost of B only as B (ID, LINKID) causing the
> > execution to perform a full scan on either A or C. This seems to be caused
> > by the implementation of whereLoopAddVirtual() function. I think it should
> > evaluate cost for terms separated by tables in the right term as well, e.g.
> > for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID,
> > LINKID) instead of only B() and B(ID, LINKID).
> >
> > What should I do?
>
> You want this (or the same thing with the roles of "A" and "C" reversed):
>
>* a full-scan on A,
>* a lookup on B by (b.id=?)
>* a lookup on C by (c.id=?)
>
> correct?
>
Yes, this is exactly what I want. It makes even more sense when there
is a WHERE
condition on table A.

> It's tricky. As you say, xBestIndex() will currently be invoked twice -
> once with no constraints usable and once with both "b.id=?" and
> "b.linkid=?" usable. I guess the reason it is not invoked in the other
> ways you suggest is that that strategy might conceivably require a huge
> number of xBestIndex() calls if there were more than a few other tables
> in the join.
>
You are absolutely correct. I do not think calling xBestIndex() for every
possible table combination is possible (too much xBestIndex calls and too much
WhereLoop variants to evaluate). I thought about adding a single call for each
table in the join, that could keep the amount of xBestIndex() calls reasonable,
and really help for this type of joined queries.

> You could change the query so that only one of the constraints is
> visible to the virtual table implementation. Say:
>
>select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID
>
This is great, it gives the possibility to choose the plan by the command.
Many thanks for the tip.

> Or rework the virtual table code so that it knows only to use one of
> "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of
> the constraints, the planner should do the right thing.
>
Unfortunately this would be hard to implement, currently I use a generic
virtual table mechanism to make cross-database queries. I hoped the decision
which condition to use could be made by the planner, without hard-coding it in
the virtual table implementation.

> Dan.

Joe
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-12 Thread Josef Kučera
Hello,
I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL
layer for querying an in memory storage. This works good, but I have a
problem with more complex queries. When querying a real SQLite database it
correctly moves the constant conditions across joined tables to optimize
the execution plan (I think this was implemented in the 3.7.17 release).
Unfortunately for virtual tables this does not seem to be supported. I can
overcome this limitation by manually tuning the SQL, but it will help if
the query planner can do this automatically.

The major problem I have is with link table evaluation. Imagine a SQL like
"select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current
implementation evaluates cost of B only as B (ID, LINKID) causing the
execution to perform a full scan on either A or C. This seems to be caused
by the implementation of whereLoopAddVirtual() function. I think it should
evaluate cost for terms separated by tables in the right term as well, e.g.
for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID,
LINKID) instead of only B() and B(ID, LINKID).

What should I do?

Best regards,
Joe
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users