[ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mamta A. Satoor updated DERBY-3926:
-----------------------------------

    Attachment: DERBY3926_patch3_051509_stat.txt
                DERBY3926_patch3_051509_diff.txt

I have attached a patch (DERBY3926_patch3_051509_diff.txt)  that fixes the 
reproducible order by query case. The problem was that when considering inner 
optimizable nodes that required ordering(but the user query has no constant 
comparison predicate(s) on those columns), we did not check if the previous 
optimizables all returned single rows before deciding to avoid sorting. If the 
previous optimizables return more than one row, then that would require 
multiple scans into the inner optimizable and the rows satisfied by the 
multiple scans may or may not be ordered as per the user query requirement. 

The new logic has gone into impl\sql\compile\OptimizerImpl.java and the 
supporting code to determine if the current inner optimizable requires ordering 
on columns with no constant comparison predicates on them is in 
impl\sql\compile\OrderByList.java

The logic has also been explained in detail in OptimizerImpl.java I will 
appreciate if someone can take a look at it. I have run junit tests and 
derbyall. There are two failures in derbyall. One of them is for 
T_RawStoreFactory which I think is existing known jira issue DERBY-3993. The 
other failure is in wisconsin test. Only one sql is failing in wisconsin. That 
sql with some optimizer overrides looks as follows.
select * from   --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1   -- DERBY-PROPERTIES index=TK1UNIQUE1
, TENKTUP2  -- DERBY-PROPERTIES index=TK2UNIQUE1
where TENKTUP1.unique1 = TENKTUP2.unique1
order by TENKTUP1.unique1, TENKTUP2.unique1;

As the name of the columns suggest, there are unique indexes on the columns we 
are dealing with in the query above. With my changes in the patch, now we are 
requiring a sort on the top of the query plan. Prior to my changes, we were 
avoiding sort on this query. The reason we are requiring sort as per the new 
logic is - The outermost optimizable is TENKTUP1 and it is going to return more 
than one row. Next, we consider TENKTUP2 as the inner optimizable. We see that 
user has requested ordering on
TENKTUP2.unique1 and there is no constant predicate on TENKTUP2.unique1 AND the 
previous optimizable is not one row resultset and because of these conditions, 
we require that sorting is necessary. I think ideally, we should be able to 
avoid sort because even though the previous optimizable is returning more than 
one row, the current optimizable has equality check with the previous 
optimizable (on the ordered columns) and hence even though there will be 
multiple scans into current optimizable, the rows will
all be ordered because of the equality check. I haven't given this additional 
logic much thought. I will look more into it.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, 
> DERBY3926_notforcheckin_patch1_051109_diff.txt, 
> DERBY3926_notforcheckin_patch1_051109_stat.txt, 
> DERBY3926_notforcheckin_patch2_051109_diff.txt, 
> DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, 
> script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to