[sqlite] Bug Report: "Is Null" where term propagated to a virtual table causes wrong query results
Hello, I have found an obscure bug in the virtual table processing. The schema looks like this: CREATE TABLE t1 (id int, value text); INSERT INTO t1 VALUES(1,'try'); CREATE TABLE t2 (ctx int, id int, value text); INSERT INTO t2 VALUES(1,1,'good'); INSERT INTO t2 VALUES(2,2,'evil'); The following query works ok if t1 and t2 are real: sqlite> select * from t2 left join t1 on t1.id=t2.ctx where t1.value is null; 2|2|evil|| If they are virtual (and the module implements ISNULL processing) the result is wrong: sqlite> select * from t2 left join t1 on t1.id=t2.ctx where t1.value is null; 1|1|good|| 2|2|evil|| Looking at WhereTrace I can see the ISNULL term applied to t1 without the L flag. The virtual module correctly returns 0 records for t1 search, but t2 records are returned anyway. As a workaround I can use a unary operator, effective blocking the ISNULL term processing in virtual module: sqlite> select * from t2 left join t1 on t1.id=t2.ctx where +t1.value is null; 2|2|evil|| Best regards, Joe ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Left Outer Join of a Table-valued Function
Hello, I have discovered a somewhat strange behavior. I utilize the table-valued functionality to inject generated data into SQL queries. Imagine a query like this: SELECT * FROM mData AS D LEFT OUTER JOIN gLink(mData.Param1) AS L ON L.ID=D.ID The virual table-valued gLink processes only the input parameter, all the other constraints are returned with argvIndex=0, omit=0 for SQLite to handle. The mData.Param1 seems to behave like an INNER JOIN constraint and limits the query output. Is that correct? I have discovered a workaround: SELECT * FROM mData AS D LEFT OUTER JOIN (SELECT * FROM gLink(mData.Param1) AS dummy) AS L ON L.ID=D.ID This query returns the expected data. Best regards Joe ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Flattener vs. result-set column names
- Original Message - From: "Simon Slavin" To: "SQLite mailing list" Sent: Thursday, July 07, 2016 5:11 PM Subject: Re: [sqlite] Query Flattener vs. result-set column names On 7 Jul 2016, at 3:37pm, Josef Kučera wrote: Imagine a query like "SELECT A.F1, B1.F FROM A JOIN (SELECT F2 AS F FROM B WHERE F3=0) B1". If the query flattener is active the result-set has columns "A.F1" and "B1.F". If it is disabled the result-set columns are "F1" and "F". The "short_column_names" option is set. I thought the optimization used, would not change the result-set column names. Was I wrong? Yes. There is no documentation for result-set column names unless you specify them using "AS". They can be anything, and they can change from one version of SQLite to another. Never rely on them unless you're using "AS". Yeah, I understand the column names are not defined, but I expected them to be static for a specific SQLite version, SQL command and database. This is a wrong assumption then? Also, your natural JOIN (sometimes called ANSI JOIN) is a little dangerous since it can collapse if you change column names or definitions. People have been complaining about the problem for over a decade. An example is when a program stops working because someone added 'info' columns to both tables and the JOIN command is now looking only for cases where they match. It's better to define the connection between the two tables explicitly. I would hope to see something more like SELECT A.F1, B.F FROM A JOIN B ON B.x = A.y This lets you define column names all in one place more neatly: SELECT A.F1 AS F1, B.F AS F1 FROM A JOIN B ON B.x = A.y rather than having to put some "AS" at the beginning and other "AS" in the sub-select. Thank you for the detailed explanation. The natural JOIN was used only to simplify the sample command, I always explicitly declare the columns used for the JOIN operation for the same reasons you mentioned. Simon. Joe ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query Flattener vs. result-set column names
Hello, today I have discovered a strange side-effect of a query flattener. Imagine a query like "SELECT A.F1, B1.F FROM A JOIN (SELECT F2 AS F FROM B WHERE F3=0) B1". If the query flattener is active the result-set has columns "A.F1" and "B1.F". If it is disabled the result-set columns are "F1" and "F". The "short_column_names" option is set. I thought the optimization used, would not change the result-set column names. Was I wrong? Best regards Joe ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Outer Join with Where terms - possible optimization
Hello, I have discovered a possible query planner improvement. I am using the SQLite with virtual table modules. Situation 1: Lets have a query like 'SELECT V.* FROM (select A.RowID A_ROWID, B.RowID B_ROWID, A.*, B.* from A left outer join B on B.ref=A.key) V WHERE V.B_ROWID=?', where A and B are virtual tables. If I am not mistaken the WHERE term effectively turns the outer join to an inner join and the query could be evaulated as two fast key searches. Unfortunately, currently the join is left as is and the execution peforms a sequentail full-scan of the A table with many key searches to the B table. Situation 2: How does SQLite evaluate tables to the left of a left join? As it seems it is done by order in the SQL. So query like this: select * from A join B on B.ref=A.key left outer join C on C.ref=A.key where C.Value=? causes a full scan of A and B before even evaulating C. By looking at the query the B table does not seem to be a pre-requisite of table C, although is it marked as one in where trace. Is this correct? Thanks. Joe
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
- Original Message - From: "Hick Gunter" To: "'General Discussion of SQLite Database'" Sent: Monday, December 15, 2014 2:40 PM Subject: Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used I would concur in that SQLite is asking "which subset of the given constraints yields the most efficient access". The possible query plans are 1) A() -> B(ID) -> C(LINKID) 2) C() -> B(LINKID) -> A(ID) 3) B() -> A(ID) + C(LINKID) or B() -> C(LINKID) + A(ID) 4) A() -> C() -> B(ID,LINKID) or C() -> A() -> B(ID,LINKID) Assuming unique keys in A and C and cardinalities of a, b and c we have estimated costs (in # of records retrieved): 1) a + a*b/a + a*b/a*1 = a + 2b 2) c + c*b/c + c*b/c*1 = c + 2b 3) b + b*1 + b*1 = 3b 4) a + a*c + a*c*b/a/c = a + a*c + b (resp. c + a*c + b) So which is the smallest cost? We know that b <= a*c, which makes query plan 4 at least as expensive as plans 1 or 2 respectively. Choosing between plans 1 and 2 means starting with the smaller of the two tables (assume a < c). So how do plans 1 and 3 compare? Plan 3 is better only for very sparse link tables where b < a < c is true. The reasoning is absolutely correct. I forgot to mention, that most often the query has a where constraint (generated at runtime) limiting A or C to a small subset. In that scenario the smallest cost would be 1) or 2) respectively. There is one more problem with plan 4): Assuming A and/or B and/or C has many records (100.000s) the query B(ID,LINKID) is executed for each and every record in A*C - if there is an constant cost for each B(ID,LINKID) execution then 4) is the most expensive option. Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
On 12/15/2014 13:23 PM, Dan Kennedy wrote: > On 12/12/2014 09:22 PM, Josef Kučera wrote: > > Hello, > > I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL > > layer for querying an in memory storage. This works good, but I have a > > problem with more complex queries. When querying a real SQLite database it > > correctly moves the constant conditions across joined tables to optimize > > the execution plan (I think this was implemented in the 3.7.17 release). > > Unfortunately for virtual tables this does not seem to be supported. I can > > overcome this limitation by manually tuning the SQL, but it will help if > > the query planner can do this automatically. > > > > The major problem I have is with link table evaluation. Imagine a SQL like > > "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current > > implementation evaluates cost of B only as B (ID, LINKID) causing the > > execution to perform a full scan on either A or C. This seems to be caused > > by the implementation of whereLoopAddVirtual() function. I think it should > > evaluate cost for terms separated by tables in the right term as well, e.g. > > for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID, > > LINKID) instead of only B() and B(ID, LINKID). > > > > What should I do? > > You want this (or the same thing with the roles of "A" and "C" reversed): > >* a full-scan on A, >* a lookup on B by (b.id=?) >* a lookup on C by (c.id=?) > > correct? > Yes, this is exactly what I want. It makes even more sense when there is a WHERE condition on table A. > It's tricky. As you say, xBestIndex() will currently be invoked twice - > once with no constraints usable and once with both "b.id=?" and > "b.linkid=?" usable. I guess the reason it is not invoked in the other > ways you suggest is that that strategy might conceivably require a huge > number of xBestIndex() calls if there were more than a few other tables > in the join. > You are absolutely correct. I do not think calling xBestIndex() for every possible table combination is possible (too much xBestIndex calls and too much WhereLoop variants to evaluate). I thought about adding a single call for each table in the join, that could keep the amount of xBestIndex() calls reasonable, and really help for this type of joined queries. > You could change the query so that only one of the constraints is > visible to the virtual table implementation. Say: > >select * from A join B on A.ID=B.ID join C on C.ID=+B.LINKID > This is great, it gives the possibility to choose the plan by the command. Many thanks for the tip. > Or rework the virtual table code so that it knows only to use one of > "b.id=?" or "b.linkid=?" at a time. If the xBestIndex only uses one of > the constraints, the planner should do the right thing. > Unfortunately this would be hard to implement, currently I use a generic virtual table mechanism to make cross-database queries. I hoped the decision which condition to use could be made by the planner, without hard-coding it in the virtual table implementation. > Dan. Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used
Hello, I am trying to use SQLite's marvellous Virtual Table mechanism as a SQL layer for querying an in memory storage. This works good, but I have a problem with more complex queries. When querying a real SQLite database it correctly moves the constant conditions across joined tables to optimize the execution plan (I think this was implemented in the 3.7.17 release). Unfortunately for virtual tables this does not seem to be supported. I can overcome this limitation by manually tuning the SQL, but it will help if the query planner can do this automatically. The major problem I have is with link table evaluation. Imagine a SQL like "select * from A join B on A.ID=B.ID join C on C.ID=B.LINKID". The current implementation evaluates cost of B only as B (ID, LINKID) causing the execution to perform a full scan on either A or C. This seems to be caused by the implementation of whereLoopAddVirtual() function. I think it should evaluate cost for terms separated by tables in the right term as well, e.g. for the mentioned SQL, table B, it should try B(), B(ID), B(LINKID), B(ID, LINKID) instead of only B() and B(ID, LINKID). What should I do? Best regards, Joe ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users