[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Mamta A. Satoor (JIRA) Thu, 16 Apr 2009 00:00:39 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699556#action_12699556
 ]


Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I talked to Army offline on this jira entry and following was his feedback on 
the issue.

He tends to agree that the outermost table m1 should not avoid sorting when the 
ordering required is on a column that is from an optimizable that is not in the 
join order yet. The join order is 0,-1,-1 when m1 decides that we can choose a 
plan which will avoid sorting. The code that makes that decision is in 
OrderByList.sortRequired(...), lines 496 - 533 Apparently, this code has been 
there since Cloudscape 2.0 days. 

**************Start of Army's analysis of the issue*******************
The comment preceding the code segment mentioned above says:

            /*
            ** Check whether the table referred to is in the table map (if any).
            ** If it isn't, we may have an ordering that does not require
            ** sorting for the tables in a partial join order.  Look for
            ** columns beyond this column to see whether a referenced table
            ** is found - if so, sorting is required (for example, in a
            ** case like ORDER BY S.A, T.B, S.C, sorting is required).
            */

The 2nd and 3rd line in comment confuses me.  The only time I can think of 
where we could assume "sort avoidance" was okay when a table number is missing 
would be if the ORDER BY expression did not refer to any tables, ex. if it was 
a constant:

    ORDER BY 'some literal', m0.value

But from what I can tell, we catch that during preprocessing and remove the 
constant, so the above code still wouldn't be useful. (Note: the above ORDER BY 
will actually cause the query to return the correct results because the 
presence of a non-column expression (esp. the literal) causes the optimizer to 
ALWAYS perform a sort--that's a workaround to the problem if the user really 
needs one...).

If the code mentioned above was removed, then I think the optimizer would 
require sorting for the first optimizable (m1), and that would in turn mean 
that we have to sort for the entire join order--which should return the correct 
results.  Of course, the thought of just removing code that's been in there for 
years is a bit scary...It would be nice to understand what the intended use 
case was, but the comments are not clear about that at all.
**************End of Army's analysis of the issue*******************

Based on the above feedback, I commented out the code from 496-533 lines in 
OrderByList and ran the junit tests and they all ran fine. The old harness had 
one test case failing (lang/wisconsin) and it failed because the query plans 
now include sorting when the original plans (without my code changes) did not 
include sorting.  It appears that wisconsin test does not check the results of 
the cursors which are getting prepared. It just opens few cursors in order to 
get their query plans and dumps the query plans without checking the results of 
those queries. So I am not sure if the results from those queries have changed 
because of the additional sorting which is being added into their query plans.

Ofcourse, the problem query shown below works fine with the code commenting 
suggested by Army 
SELECT table1.id, m0.value, m1.value FROM  --DERBY-PROPERTIES joinOrder=FIXED 
table2  m1 -- DERBY-PROPERTIES index=key3
,  table2 m0 -- DERBY-PROPERTIES index=key3
, table1 
WHERE table1.id=m0.id AND m0.name='PageSequenceId' AND table1.id=m1.id AND 
m1.name='PostComponentId' AND m1.value='21857' ORDER BY m0.value;

I would like to know what the community thinks of the code removal suggested by 
Army.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Reply via email to