[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561998#action_12561998 ] thomanie edited comment on DERBY-3301 at 1/24/08 3:39 AM: suites.All and derbyall passes with 'derby-3301-3b.diff' was (Author: thomanie): suites.All passes with 'derby-3301-3b.diff' derbyall is running. Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Assignee: Thomas Nielsen Attachments: d3301-queryplan.log, derby-3301-1.diff, derby-3301-1.stat, derby-3301-2.diff, derby-3301-3.diff, derby-3301-3b.diff, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P WHERE THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID) ); PERSONID |PROJID --- 3 |1 5 |3 4 |3 2 |1 1 |1 5 rows selected I'm expecting 7 rows to be returned here, one row for each row in the join table. Here's the schema: CREATE TABLE departments ( ID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, EMP_OF_THE_MONTH INTEGER, COMPANYID INTEGER, DISCRIMINATOR VARCHAR(255), CONSTRAINT DEPTS_COMP_FK FOREIGN KEY (COMPANYID) REFERENCES companies, CONSTRAINT DEPTS_PK PRIMARY KEY (ID) ); CREATE TABLE persons ( PERSONID INTEGER NOT NULL, FIRSTNAME VARCHAR(32) NOT NULL, LASTNAME VARCHAR(32) NOT NULL, MIDDLENAME VARCHAR(32), BIRTHDATE TIMESTAMP NOT NULL, ADDRID INTEGER, STREET VARCHAR(64), CITY VARCHAR(64), STATE CHAR(2), ZIPCODE CHAR(5), COUNTRY VARCHAR(64), HIREDATE TIMESTAMP, WEEKLYHOURS REAL, DEPARTMENT INTEGER, FUNDINGDEPT INTEGER, MANAGER INTEGER, MENTOR INTEGER, HRADVISOR INTEGER, SALARY REAL, WAGE REAL, DISCRIMINATOR varchar(255) NOT NULL, CONSTRAINT PERS_DEPT_FK FOREIGN KEY (DEPARTMENT) REFERENCES departments, CONSTRAINT PERS_FUNDDEPT_FK FOREIGN KEY (FUNDINGDEPT) REFERENCES departments, CONSTRAINT PERS_MANAGER_FK FOREIGN KEY (MANAGER) REFERENCES persons, CONSTRAINT PERS_MENTOR_FK FOREIGN KEY (MENTOR) REFERENCES persons, CONSTRAINT PERS_HRADVISOR_FK FOREIGN KEY (HRADVISOR) REFERENCES persons, CONSTRAINT EMPS_PK PRIMARY KEY (PERSONID) ); CREATE TABLE projects ( PROJID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, BUDGET DECIMAL(11,2) NOT NULL, DISCRIMINATOR VARCHAR(255), CONSTRAINT PROJS_PK PRIMARY KEY (PROJID) ); CREATE TABLE project_member ( PROJID INTEGER REFERENCES projects NOT NULL, MEMBER INTEGER REFERENCES persons NOT NULL ); ij connect 'jdbc:derby:/Users/clr/apache/jdo/trunk/tck2/target/database/derby/jdotckdb'; ij set schema applicationidentity0; 0 rows inserted/updated/deleted ij select * from persons; PERSONID |FIRSTNAME |LASTNAME |MIDDLENAME |BIRTHDATE |ADDRID |STREET |CITY |STA|ZIPC|COUNTRY |HIREDATE |WEEKLYHOURS |DEPARTMENT |FUNDINGDEPT|MANAGER|MENTOR |HRADVISOR |SALARY |WAGE |DISCRIMINATOR
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12562135#action_12562135 ] thomanie edited comment on DERBY-3301 at 1/24/08 9:22 AM: Attaching 'derby-3301-test-1.diff', a junit version of Craigs original 'derby-3301.sql' reproscript. The new regression test called WhereExistsTest.java can be used as a starting point for DERBY-3349. The test-1 patch also adds the new test to functionTests/tests/lang/_Suite.java. The test-1 patch runs successfully with the 3b patch applied, both as a single test and when run through the lang-suite. was (Author: thomanie): Attaching 'derby-3301-test-1.diff', a junit version of Craigs original 'derby-3301.sql' reproscript. The new regression test called WhereExistsTest.java can be used as a starting point for DERBY-3349. The test-1 patch also adds the new test to functionTests/tests/lang/_Suite.java. The test runs successfully with the 3b patch. Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Assignee: Thomas Nielsen Attachments: d3301-queryplan.log, derby-3301-1.diff, derby-3301-1.stat, derby-3301-2.diff, derby-3301-3.diff, derby-3301-3b.diff, derby-3301-test-1.diff, derby-3301-test-1.stat, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P WHERE THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID) ); PERSONID |PROJID --- 3 |1 5 |3 4 |3 2 |1 1 |1 5 rows selected I'm expecting 7 rows to be returned here, one row for each row in the join table. Here's the schema: CREATE TABLE departments ( ID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, EMP_OF_THE_MONTH INTEGER, COMPANYID INTEGER, DISCRIMINATOR VARCHAR(255), CONSTRAINT DEPTS_COMP_FK FOREIGN KEY (COMPANYID) REFERENCES companies, CONSTRAINT DEPTS_PK PRIMARY KEY (ID) ); CREATE TABLE persons ( PERSONID INTEGER NOT NULL, FIRSTNAME VARCHAR(32) NOT NULL, LASTNAME VARCHAR(32) NOT NULL, MIDDLENAME VARCHAR(32), BIRTHDATE TIMESTAMP NOT NULL, ADDRID INTEGER, STREET VARCHAR(64), CITY VARCHAR(64), STATE CHAR(2), ZIPCODE CHAR(5), COUNTRY VARCHAR(64), HIREDATE TIMESTAMP, WEEKLYHOURS REAL, DEPARTMENT INTEGER, FUNDINGDEPT INTEGER, MANAGER INTEGER, MENTOR INTEGER, HRADVISOR INTEGER, SALARY REAL, WAGE REAL, DISCRIMINATOR varchar(255) NOT NULL, CONSTRAINT PERS_DEPT_FK FOREIGN KEY (DEPARTMENT) REFERENCES departments, CONSTRAINT PERS_FUNDDEPT_FK FOREIGN KEY (FUNDINGDEPT) REFERENCES departments, CONSTRAINT PERS_MANAGER_FK FOREIGN KEY (MANAGER) REFERENCES persons, CONSTRAINT PERS_MENTOR_FK FOREIGN KEY (MENTOR) REFERENCES persons, CONSTRAINT PERS_HRADVISOR_FK FOREIGN KEY (HRADVISOR) REFERENCES persons, CONSTRAINT EMPS_PK PRIMARY KEY (PERSONID) ); CREATE TABLE projects ( PROJID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, BUDGET DECIMAL(11,2) NOT NULL, DISCRIMINATOR VARCHAR(255), CONSTRAINT PROJS_PK PRIMARY KEY (PROJID) ); CREATE TABLE project_member ( PROJID INTEGER REFERENCES projects NOT NULL, MEMBER INTEGER REFERENCES persons NOT NULL ); ij connect 'jdbc:derby:/Users/clr/apache/jdo/trunk/tck2/target/database/derby/jdotckdb'; ij set schema applicationidentity0; 0 rows inserted/updated/deleted ij select * from persons; PERSONID |FIRSTNAME |LASTNAME
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561383#action_12561383 ] army edited comment on DERBY-3301 at 1/22/08 9:39 AM: - Even though the diff works, the condition for skipping flattening should be relaxed to only apply to EXISTS subqueries in a WHERE clause I think the condition may need to be narrowed down even further: EXISTS subqueries _are_ allowed to be flattened from a WHERE clause _if_ that subquery's WHERE clause does not itself contain another subquery. As an example, take a look at the last query on the doc page mentioned above, namely: SELECT t1.* FROM t1, t2 WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1) gets flattened into SELECT t1.* FROM t1, t2, t3 WHERE t1.c1 = t3.c1 If changes are made to completely avoid flattening of EXISTS subqueries in WHERE clauses, then the above query will *not* be flattened into an exists join, even though it's perfectly valid to perform flattening in that case. With the approach that you've outlined the exists join optimization as a whole will be disabled (I think?). I think that has the potential to cause a performance regression for users whose queries benefit from the EXISTS join optimization today. A similar discussion was held for DERBY-3231 and in that case a decision was made to deliberately remove an optimization for the sake of correctness. But the use cases affected by that particular optimization were (most likely) few and far between. With this issue, though, I think removing EXISTS subquery flattening across the board will affect far more users, so I worry about committing such a fix. The patch as posted does not apply to the current trunk, but from what I can tell, if you run a query like the one mentioned in the documentation, ex.: create table t1 (c1 int, c2 int, c3 int); create table t2 (i int, j int); create table t3 (c1 int, vc varchar(10)); insert into t1 values (1, -1, 1), (3, -3, 9), (2, -2, 4); insert into t2 values (2, 4); insert into t3 values (1, 'one'), (3, 'three'); select t1.* from t1, t2 where exists (select * from t3 where t1.c1 = t3.c1); I think derby-3301-1.diff will cause the SELECT to skip flattening of the EXISTS subquery. Is that correct? The core problem for this issue appears to be the specific case where we have a subquery SQ1 that appears in the whereClause of *another* subquery SQ0. In that case disabling the flattening of SQ0 would be appropriate--as the documentation states. But if subquery SQ1 appears in the whereClause of an *OUTER* query--as shown above--I don't think we should disable flattening altogether. If at all possible, I think it'd be better to fix the flattening condition for the specific situation of nested subqueries than to completely disable WHERE clause flattening for EXISTS subqueries. Or put differently: if the documentation is correct, the _intent_ is to skip flattening of an EXISTS subquery that has predicates which in turn contain other subqueries. But the current code does not correctly implement that intent. So I think it'd be good to figure out _why_ the current code is wrong for the case of nested subqueries, and then try to make a change that addresses that specific problem. On a slightly different note, have you had a chance to run the regression suites with this change? I'm curious to know if any of the existing tests actually test for EXISTS join flattening--and if so, do those tests still pass with the proposed change? was (Author: army): Even though the diff works, the condition for skipping flattening should be relaxed to only apply to EXISTS subqueries in a WHERE clause I think the condition may need to be narrowed down even further: EXISTS subqueries _are_ allowed to be flattened from a WHERE clause _if_ the WHERE clause is not itself part of another subquery. As an example, take a look at the last query on the doc page mentioned above, namely: SELECT t1.* FROM t1, t2 WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1) gets flattened into SELECT t1.* FROM t1, t2, t3 WHERE t1.c1 = t3.c1 If changes are made to completely avoid flattening of EXISTS subqueries in WHERE clauses, then the above query will *not* be flattened into an exists join, even though it's perfectly valid to perform flattening in that case. With the approach that you've outlined the exists join optimization as a whole will be disabled (I think?). I think that has the potential to cause a performance regression for users whose queries benefit from the EXISTS join optimization today. A similar discussion was held for DERBY-3231 and in that case a decision was made to deliberately remove an optimization for the sake of correctness. But the use cases affected by that particular optimization were (most likely) few and far between. With this issue, though,
Re: [jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
A B (JIRA) wrote: I think the condition may need to be narrowed down even further: EXISTS subqueries _are_ allowed to be flattened from a WHERE clause _if_ that subquery's WHERE clause does not itself contain another subquery. Correct - the updated patch I'm working on attempts to skip flattening only in this scenario as per the javadoc in derby-3301-1.diff, and still flatten EXISTS subqueries without a subquery in the where clause. I didn't check patch available due to this :) I think that has the potential to cause a performance regression for users whose queries benefit from the EXISTS join optimization today. Also correct. With derby-3301-1.diff applied, the query takes noticeably more time to execute. But that is reasonable because it disables flattening of the nested exist where clause all together. The patch as posted does not apply to the current trunk, My apologies. I'll reattach it later tonight. Probably due to Katheys changes for DERBY-3257 that wasn't in my sandbox. If at all possible, I think it'd be better to fix the flattening condition for the specific situation of nested subqueries than to completely disable WHERE clause flattening for EXISTS subqueries. That is the intent of the upcoming patch. On a slightly different note, have you had a chance to run the regression suites with this change? I'm curious to know if any of the existing tests actually test for EXISTS join flattening--and if so, do those tests still pass with the proposed change? I shared your fear, but suites.All actually ran cleanly with the patch applied.
Re: [jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
Thomas Nielsen wrote: A B (JIRA) wrote: I think the condition may need to be narrowed down even further: EXISTS subqueries _are_ allowed to be flattened from a WHERE clause _if_ that subquery's WHERE clause does not itself contain another subquery. Correct - the updated patch I'm working on attempts to skip flattening only in this scenario as per the javadoc in derby-3301-1.diff, and still flatten EXISTS subqueries without a subquery in the where clause. Okay, great. I didn't quite get that from the first version of the patch/comments. Thanks for clarifying. If at all possible, I think it'd be better to fix the flattening condition for the specific situation of nested subqueries than to completely disable WHERE clause flattening for EXISTS subqueries. That is the intent of the upcoming patch. Ok, this sounds good :) I shared your fear, but suites.All actually ran cleanly with the patch applied. What about derbyall? My guess is that most tests which verify query plans currently reside in derbyall as they have not yet been converted to JUnit. If there are no tests to explicitly check for the occurrence of EXISTS subquery flattening, then that seems like an oversight in the testing for such flattening...but that would be a separate issue altogether (not something you'd have to address for DERBY-3301). Army
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561073#action_12561073 ] army edited comment on DERBY-3301 at 1/21/08 9:34 AM: - bryan Perhaps the flattening logic is evaluating the correctness rules properly bryan for a single level of flattening but isn't handling flattening-within-flattening? Without having done any explicit tracing myself, this is what I would guess is the problem. Perhaps the inner-most EXISTS subquery can be considered an exists base table (and thus only returns a single row) w.r.t. it's parent query, but if it is then flattened _again_ to the outer-most query, maybe it should no longer be marked as an exists base table? I don't if that's actually true, I'm just speculating. craig I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that craig goes into some detail of the EXISTS join flattening Something from that page which stands out to me is the following condition: None of the predicates in the subquery, including the additional one formed between the left side of the subquery operator and the column in the subquery's SELECT list (for IN or ANY subqueries), include any subqueries, method calls, or field accesses. If an EXISTS clause is considered a predicate, then the query select ... from ... where exists ( select 1 from ... where exists ( select 1 from ... ... seems to violate the condition quoted above from the documentation. That is, the subquery for the first EXISTS has a predicate (in this case another EXISTS query) which includes another subquery (select 1 from ...), and therefore the first subquery should not be flattened. If this is the correct reading then the bug would appear to be in the logic that checks for the aforementioned condition--presumably because that check occurs after the inner-most subquery has itself been flattened? Again, I'm just guessing, I haven't done much examination of the actual codepath... was (Author: army): bryan Perhaps the flattening logic is evaluating the correctness rules properly bryan for a single level of flattening but isn't handling flattening-within-flattening? Without having done any explicit tracing myself, this is what I would guess is the problem. Perhaps the inner-most EXISTS subquery can be considered an exists base table (and thus only returns a single row) w.r.t. it's parent query, but if it is then flattened _again_ to the outer-most query, maybe it should no longer be marked as an exists base table? I don't if that's actually true, I'm just speculating. craig I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that craig goes into some detail of the EXISTS join flattening Something from that page which stands out to me is the following condition: None of the predicates in the subquery, including the additional one formed between the left side of the subquery operator and the column in the subquery's SELECT list (for IN or ANY subqueries), include any subqueries, method calls, or field accesses. If an EXISTS clause is considered a predicate, then the query select ... from ... where exists ( select 1 from ... where exists ( select 1 from ... ... seems to violate the condition quoted above from the documentation. That is, the subquery for the first EXISTS has a predicate (in this case another EXISTS query) which includes another subquery (select 1 from ...), and therefore the first subquery should not be flattened. Or am I misreading? If this is the correct reading then the bug would appear to be in the logic that checks for the aforementioned condition--presumably because that check occurs after the inner-most subquery has itself been flattened? Just guessing...feel free to ignore. Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Attachments: d3301-queryplan.log, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561073#action_12561073 ] army edited comment on DERBY-3301 at 1/21/08 9:35 AM: - bryan Perhaps the flattening logic is evaluating the correctness rules properly bryan for a single level of flattening but isn't handling flattening-within-flattening? Without having done any explicit tracing myself, this is what I would guess is the problem. Perhaps the inner-most EXISTS subquery can be considered an exists base table (and thus only returns a single row) w.r.t. it's parent query, but if it is then flattened _again_ to the outer-most query, maybe it should no longer be marked as an exists base table? I don't if that's actually true, I'm just speculating. craig I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that craig goes into some detail of the EXISTS join flattening Something from that page which stands out to me is the following condition: None of the predicates in the subquery, including the additional one formed between the left side of the subquery operator and the column in the subquery's SELECT list (for IN or ANY subqueries), include any subqueries, method calls, or field accesses. If an EXISTS clause is considered a predicate, then the query select ... from ... where exists ( select 1 from ... where exists ( select 1 from ... ... seems to violate the condition quoted above from the documentation. That is, the subquery for the first EXISTS has a predicate (in this case another EXISTS query) which includes another subquery (select 1 from ...), and therefore the first subquery should not be flattened. If this is the correct reading then the bug would appear to be in the logic that checks for the aforementioned condition--presumably because that check occurs after the inner-most subquery has itself been flattened? Again, I'm just guessing, I haven't done much examination of the actual codepath... was (Author: army): bryan Perhaps the flattening logic is evaluating the correctness rules properly bryan for a single level of flattening but isn't handling flattening-within-flattening? Without having done any explicit tracing myself, this is what I would guess is the problem. Perhaps the inner-most EXISTS subquery can be considered an exists base table (and thus only returns a single row) w.r.t. it's parent query, but if it is then flattened _again_ to the outer-most query, maybe it should no longer be marked as an exists base table? I don't if that's actually true, I'm just speculating. craig I also found http://db.apache.org/derby/docs/10.3/tuning/ctuntransform25868.html that craig goes into some detail of the EXISTS join flattening Something from that page which stands out to me is the following condition: None of the predicates in the subquery, including the additional one formed between the left side of the subquery operator and the column in the subquery's SELECT list (for IN or ANY subqueries), include any subqueries, method calls, or field accesses. If an EXISTS clause is considered a predicate, then the query select ... from ... where exists ( select 1 from ... where exists ( select 1 from ... ... seems to violate the condition quoted above from the documentation. That is, the subquery for the first EXISTS has a predicate (in this case another EXISTS query) which includes another subquery (select 1 from ...), and therefore the first subquery should not be flattened. If this is the correct reading then the bug would appear to be in the logic that checks for the aforementioned condition--presumably because that check occurs after the inner-most subquery has itself been flattened? Again, I'm just guessing, I haven't done much examination of the actual codepath... Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Attachments: d3301-queryplan.log, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12561138#action_12561138 ] thomanie edited comment on DERBY-3301 at 1/21/08 1:55 PM: Thanks a lot for all the pointers folks. I think both Army and Bryan nailed it on where/why this query fails on the trunk. To check the hypothesis, I tried marking any subquery in the wherePredicates as such and skip flattening if this is a subquery in a whereClause. The happens to be exactly what Kathey did in DERBY-3257 for havingSubqueries. With the mark-and-skip fix the correct 7 rows are returned as the flattening is skipped. My approach is possibly too brute, and might be relaxed, but I'll post a proper patch tomorrow morning. was (Author: thomanie): Thanks a lot for all the pointers folks. I think both Army and Bryan nailed it on where/why this query fails on the trunk. To check the hypothesis, I tried marking any subquery in the wherePredicates as such and skip flattening if this is a subquery in a whereClause. The happens to be exactly what Kathy did in DERBY-3257 for havingSubqueries. With the mark-and-skip fix the correct 7 rows are returned as the flattening is skipped. My approach is possibly too brute, and might be relaxed, but I'll post a proper patch tomorrow morning. Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Attachments: d3301-queryplan.log, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P WHERE THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID) ); PERSONID |PROJID --- 3 |1 5 |3 4 |3 2 |1 1 |1 5 rows selected I'm expecting 7 rows to be returned here, one row for each row in the join table. Here's the schema: CREATE TABLE departments ( ID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, EMP_OF_THE_MONTH INTEGER, COMPANYID INTEGER, DISCRIMINATOR VARCHAR(255), CONSTRAINT DEPTS_COMP_FK FOREIGN KEY (COMPANYID) REFERENCES companies, CONSTRAINT DEPTS_PK PRIMARY KEY (ID) ); CREATE TABLE persons ( PERSONID INTEGER NOT NULL, FIRSTNAME VARCHAR(32) NOT NULL, LASTNAME VARCHAR(32) NOT NULL, MIDDLENAME VARCHAR(32), BIRTHDATE TIMESTAMP NOT NULL, ADDRID INTEGER, STREET VARCHAR(64), CITY VARCHAR(64), STATE CHAR(2), ZIPCODE CHAR(5), COUNTRY VARCHAR(64), HIREDATE TIMESTAMP, WEEKLYHOURS REAL, DEPARTMENT INTEGER, FUNDINGDEPT INTEGER, MANAGER INTEGER, MENTOR INTEGER, HRADVISOR INTEGER, SALARY REAL, WAGE REAL, DISCRIMINATOR varchar(255) NOT NULL, CONSTRAINT PERS_DEPT_FK FOREIGN KEY (DEPARTMENT) REFERENCES departments, CONSTRAINT PERS_FUNDDEPT_FK FOREIGN KEY (FUNDINGDEPT) REFERENCES departments, CONSTRAINT PERS_MANAGER_FK FOREIGN KEY (MANAGER) REFERENCES persons, CONSTRAINT PERS_MENTOR_FK FOREIGN KEY (MENTOR) REFERENCES persons, CONSTRAINT PERS_HRADVISOR_FK FOREIGN KEY (HRADVISOR) REFERENCES persons, CONSTRAINT EMPS_PK PRIMARY KEY (PERSONID) ); CREATE TABLE projects ( PROJID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, BUDGET DECIMAL(11,2) NOT NULL, DISCRIMINATOR VARCHAR(255), CONSTRAINT PROJS_PK PRIMARY KEY (PROJID) ); CREATE TABLE project_member ( PROJID INTEGER REFERENCES projects NOT NULL, MEMBER INTEGER REFERENCES persons NOT NULL ); ij connect 'jdbc:derby:/Users/clr/apache/jdo/trunk/tck2/target/database/derby/jdotckdb'; ij set schema
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560414#action_12560414 ] thomanie edited comment on DERBY-3301 at 1/18/08 6:52 AM: A little further investigation shows that HashScanResultSet.getNextRowCore() is acutally doing what it is supposed to do. The underlying ResultSets build a HashMap based on a key, and puts all rows matching this key into the map with that key. For the repro this means a HashMap with 3 keys with 3, 2 and 3 rows for the map entries respectively as seen in the debugger. In getNextRow() we pick the first element for a given key from the map, then move on to the next key, even though there are unfetched entries left in the map. This is where the code is flawed at the moment. For each invocation of getNextRowCore(), a call is done to reopenCore() as well. This looks suspicious. reopenCore() is a good citizen and resets any reference variables with a call to resetProbeVariables(). This call resets the indexing we use in getNextRowCore() to keep track of the index into the map entries, and force us to move on to the next key in the HashMap instead of the next entry in the HashMap. It's right there in the queryplan as well with Number of opens = 3 for the lower HashScanResultSet :) There's the reason for the three, and not seven, rows. The big question right now is who calls, and why is, reopenCore() called twice? I probably won't have time to investigate until Monday. If someone else feels like fixing this, please go ahead. was (Author: thomanie): A little further investigation shows that HashScanResultSet.getNextRowCore() is acutally doing what it is supposed to do. The underlying ResultSets build a HashMap based on a key, and puts all rows matching this key into the map with that key. For the repro this means a HashMap with 3 keys with 3, 2 and 3 rows for the map entries respectively as seen in the debugger. In getNextRow() we pick the first element for a given key from the map, then move on to the next key, even though there are unfetched entries left in the map. This is where the code is flawed at the moment. For each invocation of getNextRowCore(), a call is done to openCore() as well. This looks suspicious. openCore() is a good citizen and resets any reference variables with a call to resetProbeVariables(). This call resets the indexing we use in getNextRowCore() to keep track of the index into the map entries, and force us to move on to the next key in the HashMap instead of the next entry in the HashMap. It's right there in the queryplan as well with Number of opens = 3 for the lower HashScanResultSet :) There's the reason for the three, and not seven, rows. The big question right now is who calls, and why is, openCore() called trice? I probably won't have time to investigate until Monday. If someone else feels like fixing this, please go ahead. Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Attachments: d3301-queryplan.log, derby-3301.sql Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P WHERE THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID) ); PERSONID |PROJID --- 3 |1 5 |3 4 |3 2 |1 1 |1 5 rows selected I'm expecting 7 rows to be returned here, one row for each row in the join table. Here's the schema: CREATE TABLE departments ( ID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, EMP_OF_THE_MONTH INTEGER, COMPANYID INTEGER,
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560582#action_12560582 ] thomanie edited comment on DERBY-3301 at 1/18/08 2:53 PM: Thanks for the pointers Bryan. Above the HashScanResultSet, there is a NestedLoopJoinResultSet pulling rows from both left and right childs in getNextRowCore(). NesteLoopJoinResultSet.getNextRowCore() will stop pulling rows if its member oneRowRightSide is true. oneRowRightSide is supplied to the constructor, so it's set at ResultSet generation time. NestedLoopJoinResultSet is generated by GenericResultSetFactory.getNestedLoopJoinResult(), which again is only called from JoinNode.generate(). To see this you first have to find the getter to the NestedLoopJoinResultSet in GenericResultSetFactory, then do a text(!) search for use of the getter method. In this case it's only used once - in NesteLoopJoinStrategy.joinResultSetMethodName(), which again is only used in JoinNode.generate. So why is it setting oneRowRightSide to true? The arguments to the NestedLoopJoinResultSet constructor is pushed to the stack by JoinNode.getJoinArguments(), where one calls JoinNode.oneRowRightSide(). oneRowRightSide() simply pushes rightResultSet.isOneRowResultSet() to the stack and continues along merrily (JoinNode.java @ 1691) In the debugger we see that the first hit of a breakpoint in JoinNode.java @1691 is the interesting one. rightResultSet is a ProjectRestrictNode over a FromBaseTable. That is the select we want to look at. ProjectRestrictNode.isOneRowResult() simply calls its childResult.isOneRowResult(), so we end up in FromBaseTable.isOneRowResult(). FromBaseTable.isOneRowResult() start off with this comment and code: // EXISTS FBT will only return a single row if (existsBaseTable) { return true; } existsBaseTable is true in the failing query. existsBaseTable is only set in FromBaseTable.setExistsBaseTable(), and this method is only called form FromList.genExistsBaseTables() where it is explicitly set to true. It seems to me that the assumption made in FromBaseTable.isOneRowResultI() is wrong? It's been there ever since derby code was donated. That's most likely the reason for all previous versions showing identical behaviour. With those 4 lines making the single row assumption for exists commented out, and letting the rest of the isOneRowResult() logic make the decision, it returns false as expected in this case. The query returns 7 rows as Craig expects. EMPID |PROJID --- 13 |101 12 |101 11 |101 13 |102 12 |102 15 |103 14 |103 Any one care to comment on my analysis and especially on the exists assumption made? If this is correct, any query involving and EXISTS *potentially* will produce wrong results, and this could be the root cause for a couple of the other EXISTS issues? was (Author: thomanie): Thanks for the pointers Bryan. NesteLoopJoinResultSet.getNextRowCore() will stop pulling rows if its member oneRowRightSide is true. oneRowRightSide is supplied to the constructor, so it's set at ResultSet generation time. NestedLoopJoinResultSet is generated by GenericResultSetFactory.getNestedLoopJoinResult(), which again is only called from JoinNode.generate(). To see this you first have to find the getter to the NestedLoopJoinResultSet in GenericResultSetFactory, then do a text(!) search for use of the getter method. In this case it's only used once - in NesteLoopJoinStrategy.joinResultSetMethodName(), which again is only used in JoinNode.generate. So why is it setting oneRowRightSide to true? The arguments to the NestedLoopJoinResultSet constructor is pushed to the stack by JoinNode.getJoinArguments(), where one calls JoinNode.oneRowRightSide(). oneRowRightSide() simply pushes rightResultSet.isOneRowResultSet() to the stack and continues along merrily (JoinNode.java @ 1691) In the debugger we see that the first hit of a breakpoint in JoinNode.java @1691 is the interesting one. rightResultSet is a ProjectRestrictNode over a FromBaseTable. That is the select we want to look at. ProjectRestrictNode.isOneRowResult() simply calls its childResult.isOneRowResult(), so we end up in FromBaseTable.isOneRowResult(). FromBaseTable.isOneRowResult() start off with this comment and code: // EXISTS FBT will only return a single row if (existsBaseTable) { return true; } existsBaseTable is true in the failing query. existsBaseTable is only set in FromBaseTable.setExistsBaseTable(), and this method is only called form FromList.genExistsBaseTables() where it is explicitly set to true. It seems to me that the assumption made in FromBaseTable.isOneRowResultI() is wrong? It's been there ever
[jira] Issue Comment Edited: (DERBY-3301) Incorrect result from query with nested EXIST
[ https://issues.apache.org/jira/browse/DERBY-3301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12556033#action_12556033 ] clr edited comment on DERBY-3301 at 1/4/08 11:43 AM: --- There is a similar query that sometimes returns the expected result and sometimes returns a single row. The difference in the query is that there is an extra clause in the WHERE clause: AND UNBOUND_P.NAME = 'orange' This query is expected to return three rows PERSONID |PROJID --- 3 |1 2 |1 1 |1 3 rows selected but sometimes only one row is returned PERSONID |PROJID --- 1 |1 1 rows selected was (Author: clr): There is a similar query that sometimes returns the expected result and sometimes returns a single row. The difference in the query is that there is an extra clause in the WHERE clause: AND UNBOUND_P.NAME = 'orange' This query is expected to return three rows PERSONID |PROJID --- 3 |1 2 |1 1 |1 3 rows selected but sometimes only one row is returned PERSONID |PROJID --- 1 |1 3 rows selected Incorrect result from query with nested EXIST - Key: DERBY-3301 URL: https://issues.apache.org/jira/browse/DERBY-3301 Project: Derby Issue Type: Bug Components: SQL Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.2.1 Reporter: Craig Russell Derby returns unexpected results from a query with embedded EXIST clauses. The query SQL is generated by the JPOX jdo implementation and is supposed to return all of the PERSONS and PROJECTS where there is an entry in the join table. This query works as expected when using MySQL. Here's the query: SELECT UNBOUND_E.PERSONID, UNBOUND_P.PROJID FROM applicationidentity0.DEPARTMENTS THIS, applicationidentity0.PERSONS UNBOUND_E, applicationidentity0.PROJECTS UNBOUND_P WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PERSONS THIS_EMPLOYEES_E WHERE EXISTS ( SELECT 1 FROM applicationidentity0.PROJECT_MEMBER THIS_EMPLOYEES_E_PROJECTS_P WHERE THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E_PROJECTS_P.MEMBER = THIS_EMPLOYEES_E.PERSONID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND THIS_EMPLOYEES_E.DEPARTMENT = THIS.ID AND UNBOUND_P.PROJID = THIS_EMPLOYEES_E_PROJECTS_P.PROJID AND UNBOUND_E.PERSONID = THIS_EMPLOYEES_E.PERSONID) ); PERSONID |PROJID --- 3 |1 5 |3 4 |3 2 |1 1 |1 5 rows selected I'm expecting 7 rows to be returned here, one row for each row in the join table. Here's the schema: CREATE TABLE departments ( ID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, EMP_OF_THE_MONTH INTEGER, COMPANYID INTEGER, DISCRIMINATOR VARCHAR(255), CONSTRAINT DEPTS_COMP_FK FOREIGN KEY (COMPANYID) REFERENCES companies, CONSTRAINT DEPTS_PK PRIMARY KEY (ID) ); CREATE TABLE persons ( PERSONID INTEGER NOT NULL, FIRSTNAME VARCHAR(32) NOT NULL, LASTNAME VARCHAR(32) NOT NULL, MIDDLENAME VARCHAR(32), BIRTHDATE TIMESTAMP NOT NULL, ADDRID INTEGER, STREET VARCHAR(64), CITY VARCHAR(64), STATE CHAR(2), ZIPCODE CHAR(5), COUNTRY VARCHAR(64), HIREDATE TIMESTAMP, WEEKLYHOURS REAL, DEPARTMENT INTEGER, FUNDINGDEPT INTEGER, MANAGER INTEGER, MENTOR INTEGER, HRADVISOR INTEGER, SALARY REAL, WAGE REAL, DISCRIMINATOR varchar(255) NOT NULL, CONSTRAINT PERS_DEPT_FK FOREIGN KEY (DEPARTMENT) REFERENCES departments, CONSTRAINT PERS_FUNDDEPT_FK FOREIGN KEY (FUNDINGDEPT) REFERENCES departments, CONSTRAINT PERS_MANAGER_FK FOREIGN KEY (MANAGER) REFERENCES persons, CONSTRAINT PERS_MENTOR_FK FOREIGN KEY (MENTOR) REFERENCES persons, CONSTRAINT PERS_HRADVISOR_FK FOREIGN KEY (HRADVISOR) REFERENCES persons, CONSTRAINT EMPS_PK PRIMARY KEY (PERSONID) ); CREATE TABLE projects ( PROJID INTEGER NOT NULL, NAME VARCHAR(32) NOT NULL, BUDGET DECIMAL(11,2) NOT NULL, DISCRIMINATOR VARCHAR(255), CONSTRAINT PROJS_PK PRIMARY KEY (PROJID) ); CREATE TABLE project_member ( PROJID INTEGER REFERENCES projects NOT NULL, MEMBER INTEGER REFERENCES persons NOT NULL ); ij connect 'jdbc:derby:/Users/clr/apache/jdo/trunk/tck2/target/database/derby/jdotckdb'; ij set schema applicationidentity0; 0 rows inserted/updated/deleted ij select * from persons; PERSONID |FIRSTNAME |LASTNAME |MIDDLENAME |BIRTHDATE |ADDRID |STREET