[ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711301#action_12711301
 ] 

Bryan Pendleton commented on DERBY-3926:
----------------------------------------

I've been following this discussion, and learning a lot! Thanks much for the 
careful writeups and explanations.

It seems to me that the problem with the current Wisconsin query has to do with 
the precise definition of "one row result set".

If we go back to Mike's original insight some time ago, he said:

> the key here is that the outer table(table3) is returning more than one row 
> and
> each one of those row is requiring us to look at the middle table (table2) 
> which
> results into 3 scans on table2

And that seems correct to me. But at some point I think we lost the "AND" part 
of Mike's statement
when it was translated into the bits-and-bytes of actual code.

That is, in the Wisconsin query in question, the outer table (TENKTUP1) is 
returning
more than one row, *however* each one of those rows only results in a single
open of the inner table (TENKTUP2), because the join key is unique.

This comment from the patch seems to describe the essence of the issue:

+        * ... and hence we need to make sure 
+        * that the outer predicates in the join order are all
+        * one row optimizables meaning that they can at the
+        * most return only one row. If they return more than
+        * one row, then it will require multiple scans of the
+        * current optimizable and the rows returned from
+        * those multiple scans may not be ordered correctly.

The thing I think we need to do is to figure out some way to encode the 
following test:

   Are the outer predicates in the join such that they will require at most one 
scan of the current optimizable?

I think the reason the Wisconsin query has changed behavior (now includes an 
unnecessary
sort) is because the patch isn't quite expressing exactly this idea. Outer 
predicates which
are *not* one row result sets should still be able to perform sort avoidance 
plans, *as long as*
the join condition will perform only a single open of the inner optimizable.


> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, 
> DERBY3926_notforcheckin_patch1_051109_diff.txt, 
> DERBY3926_notforcheckin_patch1_051109_stat.txt, 
> DERBY3926_notforcheckin_patch2_051109_diff.txt, 
> DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, 
> DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, 
> script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to