[jira] Updated: (DERBY-3288) wrong query result in presence of a unique index

A B (JIRA) Mon, 28 Jan 2008 14:41:59 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


A B updated DERBY-3288:
-----------------------

    Attachment: d3288_v2.patch

Attaching d3288_v2.patch, which is a first attempt a complete solution for this 
issue.  The patch does the following:

  1. Move all of the existing "pull optimizable" processing out
     of the getNextPermutation() method and into its own method,
     pullOptimizableFromJoinOrder(), for the sake of clarity.

  2. Add a call to pullOptimizableFromJoinOrder() that updates
     assignedTableMap to correctly reflect the pulling of the
     optimizable in question.
     
  3. Make the getNextPermutation() metod call the new method
     (pullOptimizableFromJoinOrder()) **BEFORE** trying to
     determine what the next legal optimizable for the join
     order can be.

Changes 1 thru 3 above fix the query as reported in this Jira.

But with those changes, store/SpaceTable.sql was failing with an ASSERT failure 
just as it did for d3288_incomplete_v1.patch.  I looked into this and 
eventually tracked it down to the fact that the FromVTI class misuses the 
"referencedTableMap" field that it inherits from ResultSetNode.java.  More 
specifically, that field is intended to hold table numbers for all FromTables 
which appear at or _beneath_ a given FromTable in the query tree.  But for 
FromVTI, the referencedTableMap was holding table numbers for all FromTables 
which were referenced by the _arguments_ to the FromVTI.  As an example, take 
the following query (which is pulled from store/SpaceTable.sql):

  select conglomeratename, isindex,
    numallocatedpages, numfreepages, pagesize, estimspacesaving
  from SYS.SYSSCHEMAS s, SYS.SYSTABLES t,
     new org.apache.derby.diag.SpaceTable(SCHEMANAME,TABLENAME) v
  where s.SCHEMAID = t.SCHEMAID
  and s.SCHEMANAME = 'APP'
  order by conglomeratename;

In this query SpaceTable will be parsed as a FromVTI and will get its own table 
number; SYSSCHEMAS and SYSTABLES will each get its own table number, as well.

That said, the referencedTableMap for the FromVTI should only contain the table 
number of the FromVTI itself--because there are no other FromTables beneath the 
FromVTI in the query tree (esp. FromVTI doesn't have any children nodes that 
are instances of FromTable).  But with the current codeline, the arguments for 
SpaceTable will be processed and their table numbers will be added to FromVTI's 
referencedTableMap, so that FromVTI's referenced table map will contain:

  { SYSSCHEMAS, SYSTABLES, SpaceTable }

instead of

  { SpaceTable }

This confuses the optimizer's dependency checking, which assumes that a 
FromTable's referenced map only contains table numbers for itself and any 
FromTable children. The result of this confusion is a join order that puts 
SpaceTable _before_ SYSSCHEMAS and SYSTABLES, which means that attempts to 
generate the FromVTI wll fail because the tables that it references have not 
yet been generated.

Note that FromVTI _does_ have a dependency on SYSSCHEMAS and SYSTABLES as a 
result of its arguments--that part is correct.  And the current code _does_ try 
to enforce that dependency.  But the misuse of referencedTableMap causes the 
current code to do the wrong thing.  This is because the assignedTableMap used 
by the Optimizer sets/unsets bits based on the referenced table map.  So we end 
up in a situation like this:

  Join order:       { SYSTABLES, SYSSCHEMAS, SpaceTable }
  assignedTableMap: { SYSTABLES, SYSSCHEMAS, SpaceTable }

  -- Pull SpaceTable from the join order.  We do an XOR between
     assignedTableMap and SpaceTable's referenced table map. Since
         SpaceTable's referenced table (incorrectly) includes bits
         for SYSSCHEMAS and SYSTABLES, the XOR yields:

       Join order:       { SYSTABLES, SYSSCHEMAS, -- }
       assignedTableMap: { }  // this is WRONG!

     Note how the assignedTableMap is now empty.  This is wrong--
     after the pull there are still two tables in the join order,
     so assignedTablemap should reflect those tables.  But due
     to the FromVTI's incorrect referenced map, assignedTableMap
     no longer matches the join order.

  -- Pull SYSSCHEMAS from the join order.  We do an XOR between
         assignedTableMap and SYSSCHEMA's referenced table map,
     yielding:

       Join order:       { SYSTABLES, --, -- }
       assignedTableMap: { SYSSCHEMAS }

         Note how assignedTableMap is STILL wrong here.  Since we
     (incorrectly) cleared assignedTableMap above, the XOR with
     SYSSCHEMAS actually _added_ SYSSCHEMAS to the assigned
     table map, which is the opposite of what were supposed
         to do.  Since SYSSCHEMAS is no longer in the join order,
         it should not show up in assignedTableMap.

  -- Attempt to place SpaceTable in the join order. Since it
     depends on { SYSSCHEMAS, SYSTABLES } and assignedTableMap
     does not include both of those, we can't place it.  So
     we'll pull again...

  -- Pull SYSTABLES from the join order.  We do an XOR between
         assignedTableMap and SYSTABLES's referenced table map,
     yielding:

       Join order:       { --, --, -- }
       assignedTableMap: { SYSSCHEMAS, SYSTABLES }

         So our assignedTableMap is completely wrong at this point.
         Since we (incorrectly) cleared assignedTableMap when we
         pulled SpaceTable, our assigned table map now indicates that
         SYSSCHEMAS and SYSTABLES are both already in the join order,
     when in truth the join order is EMPTY at this point.

  -- Attempt to place SpaceTable in the join order.  Since it
     depends on { SYSSCHEMAS, SYSTABLES } and since assignedTableMap
     incorrectly includes both of those, the optimizer will place
     SpaceTable into the first position in the join order.  We
     end up with:

       Join order:       { SpaceTable, --, -- }
       assignedTableMap: { SYSSCHEMAS, SYSTABLES, SpaceTable }

This becomes the "cheapest" join order that the optimizer finds, so this is 
what Derby will try to generate.  But since SpaceTable depends on SYSSCHEMAS 
and SYSTABLES, which will only be generated _after_ SpaceTable (because 
FromTables are generated in the order in which they appear in the join order), 
the generation code for SpaceTable will fail.

All of that to say that FromVTI should not include table numbers for its 
arguments in its referencedTableMap.  So with that in mind, the d3288_v2.patch 
also does the following:

  4. Fix FromVTI to correctly set up its referencedTableMap and
     dependency map, without mixing the two together.

  5. Remove some assignedTableMap "clear" (XOR) logic that appears
     to only have been necessary due to the bug in FromVTI--at least
     as far as I can tell.  Removing this logic makes it so that
     there is now a _single_ place in the code where we "add to" the
     assigned table map (namely, when placing an optimizable), and
     there is a _single_ place where we "remove from" the assigned
     table map (namely, when pulling an optimizable).

  6. Add the repro for this issue to the existing lang/subqueryFlattening.sql
     tests, since that's where EXISTS flattening is currently tested.

I ran derbyall and suites.All with an earlier version of this patch and 
everything ran cleanly.  Since then I have added a few ASSERT statements to 
sanity check the contents of assignedTableMap at certain points during the join 
order processing.  I have not yet run the regression suites with these ASSERTs 
in place, but plan to do so shortly.

In the meantime, if anyone has any comments on the patch as posted, I'd be glad 
to hear them.

Note to reviewers: The patch itself is 929 lines, but the vast majority of that 
comes from the relocation of "pull optimizable" processing into its own method.

> wrong query result in presence of a unique index
> ------------------------------------------------
>
>                 Key: DERBY-3288
>                 URL: https://issues.apache.org/jira/browse/DERBY-3288
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.4.0.0
>            Reporter: Gerald Khin
>            Assignee: A B
>         Attachments: d3288_incomplete_v1.patch, d3288_v2.patch, DERBY-3288.htm
>
>
> The DDL to reproduce the bug is:
> CREATE TABLE tab_a (PId BIGINT NOT NULL);
> CREATE TABLE tab_c (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL, 
> PBId BIGINT NOT NULL);
> INSERT INTO tab_c VALUES (91, 81, 82);
> INSERT INTO tab_c VALUES (92, 81, 84);
> INSERT INTO tab_c VALUES (93, 81, 88);
> INSERT INTO tab_c VALUES (96, 81, 83);
> CREATE TABLE tab_v (OId BIGINT NOT NULL , UGId BIGINT NOT NULL, val CHAR(1) 
> NOT NULL);
> CREATE UNIQUE INDEX tab_v_i1 ON tab_v (OId, UGId, val);
> CREATE INDEX tab_v_i2 ON tab_v (UGId, val, OId);
> INSERT INTO tab_v VALUES (81, 31, 'A'); 
> INSERT INTO tab_v VALUES (82, 31, 'A'); 
> INSERT INTO tab_v VALUES (83, 31, 'A'); 
> INSERT INTO tab_v VALUES (84, 31, 'A'); 
> INSERT INTO tab_v VALUES (85, 31, 'A'); 
> INSERT INTO tab_v VALUES (86, 31, 'A'); 
> INSERT INTO tab_v VALUES (87, 31, 'A'); 
> INSERT INTO tab_v VALUES (81, 32, 'A'); 
> INSERT INTO tab_v VALUES (82, 32, 'A'); 
> INSERT INTO tab_v VALUES (83, 32, 'A'); 
> INSERT INTO tab_v VALUES (84, 32, 'A'); 
> INSERT INTO tab_v VALUES (85, 32, 'A'); 
> INSERT INTO tab_v VALUES (86, 32, 'A'); 
> INSERT INTO tab_v VALUES (87, 32, 'A');
> CREATE TABLE tab_b (Id BIGINT NOT NULL PRIMARY KEY, OId BIGINT NOT NULL);
> INSERT INTO tab_b VALUES (141, 81);
> INSERT INTO tab_b VALUES (142, 82);
> INSERT INTO tab_b VALUES (143, 84);
> INSERT INTO tab_b VALUES (144, 88);
> INSERT INTO tab_b VALUES (151, 81);
> INSERT INTO tab_b VALUES (152, 83);
> CREATE TABLE tab_d (Id BIGINT NOT NULL PRIMARY KEY, PAId BIGINT NOT NULL, 
> PBId BIGINT NOT NULL);
> INSERT INTO tab_d VALUES (181, 141, 142);
> INSERT INTO tab_d VALUES (182, 141, 143);
> INSERT INTO tab_d VALUES (186, 151, 152);
> The query returning the wrong result is:
> SELECT tab_b.Id
> FROM tab_b JOIN tab_c ON (tab_b.OId = tab_c.PAId OR tab_b.OId = tab_c.PBId) 
> LEFT OUTER JOIN tab_a ON tab_b.OId = PId 
> WHERE EXISTS (SELECT 'X' FROM tab_d WHERE (PAId = 141 AND PBId = tab_b.Id) OR 
> (PBId = 141 AND PAId = tab_b.Id)) 
>   AND EXISTS (SELECT 'X' FROM tab_v WHERE OId = tab_b.OId AND UGId = 31 AND 
> val = 'A')
> The result should consist of two rows (142),(143), but it returns only one 
> row (142).
> The correct result would be returned if the index tab_v_i1 had been created 
> as non-unique.
> The correct result would also be returned if the condition ...AND val='A' had 
> been replaced by ...AND val='A'  || ''.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3288) wrong query result in presence of a unique index

Reply via email to