[ https://issues.apache.org/jira/browse/DERBY-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
A B updated DERBY-3288: ----------------------- Attachment: d3288_v2.patch Attaching d3288_v2.patch, which is a first attempt a complete solution for this issue. The patch does the following: 1. Move all of the existing "pull optimizable" processing out of the getNextPermutation() method and into its own method, pullOptimizableFromJoinOrder(), for the sake of clarity. 2. Add a call to pullOptimizableFromJoinOrder() that updates assignedTableMap to correctly reflect the pulling of the optimizable in question. 3. Make the getNextPermutation() metod call the new method (pullOptimizableFromJoinOrder()) **BEFORE** trying to determine what the next legal optimizable for the join order can be. Changes 1 thru 3 above fix the query as reported in this Jira. But with those changes, store/SpaceTable.sql was failing with an ASSERT failure just as it did for d3288_incomplete_v1.patch. I looked into this and eventually tracked it down to the fact that the FromVTI class misuses the "referencedTableMap" field that it inherits from ResultSetNode.java. More specifically, that field is intended to hold table numbers for all FromTables which appear at or _beneath_ a given FromTable in the query tree. But for FromVTI, the referencedTableMap was holding table numbers for all FromTables which were referenced by the _arguments_ to the FromVTI. As an example, take the following query (which is pulled from store/SpaceTable.sql): select conglomeratename, isindex, numallocatedpages, numfreepages, pagesize, estimspacesaving from SYS.SYSSCHEMAS s, SYS.SYSTABLES t, new org.apache.derby.diag.SpaceTable(SCHEMANAME,TABLENAME) v where s.SCHEMAID = t.SCHEMAID and s.SCHEMANAME = 'APP' order by conglomeratename; In this query SpaceTable will be parsed as a FromVTI and will get its own table number; SYSSCHEMAS and SYSTABLES will each get its own table number, as well. That said, the referencedTableMap for the FromVTI should only contain the table number of the FromVTI itself--because there are no other FromTables beneath the FromVTI in the query tree (esp. FromVTI doesn't have any children nodes that are instances of FromTable). But with the current codeline, the arguments for SpaceTable will be processed and their table numbers will be added to FromVTI's referencedTableMap, so that FromVTI's referenced table map will contain: { SYSSCHEMAS, SYSTABLES, SpaceTable } instead of { SpaceTable } This confuses the optimizer's dependency checking, which assumes that a FromTable's referenced map only contains table numbers for itself and any FromTable children. The result of this confusion is a join order that puts SpaceTable _before_ SYSSCHEMAS and SYSTABLES, which means that attempts to generate the FromVTI wll fail because the tables that it references have not yet been generated. Note that FromVTI _does_ have a dependency on SYSSCHEMAS and SYSTABLES as a result of its arguments--that part is correct. And the current code _does_ try to enforce that dependency. But the misuse of referencedTableMap causes the current code to do the wrong thing. This is because the assignedTableMap used by the Optimizer sets/unsets bits based on the referenced table map. So we end up in a situation like this: Join order: { SYSTABLES, SYSSCHEMAS, SpaceTable } assignedTableMap: { SYSTABLES, SYSSCHEMAS, SpaceTable } -- Pull SpaceTable from the join order. We do an XOR between assignedTableMap and SpaceTable's referenced table map. Since SpaceTable's referenced table (incorrectly) includes bits for SYSSCHEMAS and SYSTABLES, the XOR yields: Join order: { SYSTABLES, SYSSCHEMAS, -- } assignedTableMap: { } // this is WRONG! Note how the assignedTableMap is now empty. This is wrong-- after the pull there are still two tables in the join order, so assignedTablemap should reflect those tables. But due to the FromVTI's incorrect referenced map, assignedTableMap no longer matches the join order. -- Pull SYSSCHEMAS from the join order. We do an XOR between assignedTableMap and SYSSCHEMA's referenced table map, yielding: Join order: { SYSTABLES, --, -- } assignedTableMap: { SYSSCHEMAS } Note how assignedTableMap is STILL wrong here. Since we (incorrectly) cleared assignedTableMap above, the XOR with SYSSCHEMAS actually _added_ SYSSCHEMAS to the assigned table map, which is the opposite of what were supposed to do. Since SYSSCHEMAS is no longer in the join order, it should not show up in assignedTableMap. -- Attempt to place SpaceTable in the join order. Since it depends on { SYSSCHEMAS, SYSTABLES } and assignedTableMap does not include both of those, we can't place it. So we'll pull again... -- Pull SYSTABLES from the join order. We do an XOR between assignedTableMap and SYSTABLES's referenced table map, yielding: Join order: { --, --, -- } assignedTableMap: { SYSSCHEMAS, SYSTABLES } So our assignedTableMap is completely wrong at this point. Since we (incorrectly) cleared assignedTableMap when we pulled SpaceTable, our assigned table map now indicates that SYSSCHEMAS and SYSTABLES are both already in the join order, when in truth the join order is EMPTY at this point. -- Attempt to place SpaceTable in the join order. Since it depends on { SYSSCHEMAS, SYSTABLES } and since assignedTableMap incorrectly includes both of those, the optimizer will place SpaceTable into the first position in the join order. We end up with: Join order: { SpaceTable, --, -- } assignedTableMap: { SYSSCHEMAS, SYSTABLES, SpaceTable } This becomes the "cheapest" join order that the optimizer finds, so this is what Derby will try to generate. But since SpaceTable depends on SYSSCHEMAS and SYSTABLES, which will only be generated _after_ SpaceTable (because FromTables are generated in the order in which they appear in the join order), the generation code for SpaceTable will fail. All of that to say that FromVTI should not include table numbers for its arguments in its referencedTableMap. So with that in mind, the d3288_v2.patch also does the following: 4. Fix FromVTI to correctly set up its referencedTableMap and dependency map, without mixing the two together. 5. Remove some assignedTableMap "clear" (XOR) logic that appears to only have been necessary due to the bug in FromVTI--at least as far as I can tell. Removing this logic makes it so that there is now a _single_ place in the code where we "add to" the assigned table map (namely, when placing an optimizable), and there is a _single_ place where we "remove from" the assigned table map (namely, when pulling an optimizable). 6. Add the repro for this issue to the existing lang/subqueryFlattening.sql tests, since that's where EXISTS flattening is currently tested. I ran derbyall and suites.All with an earlier version of this patch and everything ran cleanly. Since then I have added a few ASSERT statements to sanity check the contents of assignedTableMap at certain points during the join order processing. I have not yet run the regression suites with these ASSERTs in place, but plan to do so shortly. In the meantime, if anyone has any comments on the patch as posted, I'd be glad to hear them. Note to reviewers: The patch itself is 929 lines, but the vast majority of that comes from the relocation of "pull optimizable" processing into its own method. > wrong query result in presence of a unique index > ------------------------------------------------ > > Key: DERBY-3288 > URL: https://issues.apache.org/jira/browse/DERBY-3288 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.4.0.0 > Reporter: Gerald Khin > Assignee: A B > Attachments: d3288_incomplete_v1.patch, d3288_v2.patch, DERBY-3288.htm > > > The DDL to reproduce the bug is: > CREATE TABLE tab_a (PId BIGINT NOT NULL); > CREATE TABLE tab_c (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL, > PBId BIGINT NOT NULL); > INSERT INTO tab_c VALUES (91, 81, 82); > INSERT INTO tab_c VALUES (92, 81, 84); > INSERT INTO tab_c VALUES (93, 81, 88); > INSERT INTO tab_c VALUES (96, 81, 83); > CREATE TABLE tab_v (OId BIGINT NOT NULL , UGId BIGINT NOT NULL, val CHAR(1) > NOT NULL); > CREATE UNIQUE INDEX tab_v_i1 ON tab_v (OId, UGId, val); > CREATE INDEX tab_v_i2 ON tab_v (UGId, val, OId); > INSERT INTO tab_v VALUES (81, 31, 'A'); > INSERT INTO tab_v VALUES (82, 31, 'A'); > INSERT INTO tab_v VALUES (83, 31, 'A'); > INSERT INTO tab_v VALUES (84, 31, 'A'); > INSERT INTO tab_v VALUES (85, 31, 'A'); > INSERT INTO tab_v VALUES (86, 31, 'A'); > INSERT INTO tab_v VALUES (87, 31, 'A'); > INSERT INTO tab_v VALUES (81, 32, 'A'); > INSERT INTO tab_v VALUES (82, 32, 'A'); > INSERT INTO tab_v VALUES (83, 32, 'A'); > INSERT INTO tab_v VALUES (84, 32, 'A'); > INSERT INTO tab_v VALUES (85, 32, 'A'); > INSERT INTO tab_v VALUES (86, 32, 'A'); > INSERT INTO tab_v VALUES (87, 32, 'A'); > CREATE TABLE tab_b (Id BIGINT NOT NULL PRIMARY KEY, OId BIGINT NOT NULL); > INSERT INTO tab_b VALUES (141, 81); > INSERT INTO tab_b VALUES (142, 82); > INSERT INTO tab_b VALUES (143, 84); > INSERT INTO tab_b VALUES (144, 88); > INSERT INTO tab_b VALUES (151, 81); > INSERT INTO tab_b VALUES (152, 83); > CREATE TABLE tab_d (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL, > PBId BIGINT NOT NULL); > INSERT INTO tab_d VALUES (181, 141, 142); > INSERT INTO tab_d VALUES (182, 141, 143); > INSERT INTO tab_d VALUES (186, 151, 152); > The query returning the wrong result is: > SELECT tab_b.Id > FROM tab_b JOIN tab_c ON (tab_b.OId = tab_c.PAId OR tab_b.OId = tab_c.PBId) > LEFT OUTER JOIN tab_a ON tab_b.OId = PId > WHERE EXISTS (SELECT 'X' FROM tab_d WHERE (PAId = 141 AND PBId = tab_b.Id) OR > (PBId = 141 AND PAId = tab_b.Id)) > AND EXISTS (SELECT 'X' FROM tab_v WHERE OId = tab_b.OId AND UGId = 31 AND > val = 'A') > The result should consist of two rows (142),(143), but it returns only one > row (142). > The correct result would be returned if the index tab_v_i1 had been created > as non-unique. > The correct result would also be returned if the condition ...AND val='A' had > been replaced by ...AND val='A' || ''. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.