[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

A B (JIRA) Thu, 16 Apr 2009 09:14:37 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699739#action_12699739
 ]


A B commented on DERBY-3926:
----------------------------

For what it's worth, I agree with everything Bryan wrote in his April 16th 
comment :)

> I think that all *chosen* query plans eventually reach a *complete* join 
> order, but a query
> plan which is *discarded* due to being too expensive may never get beyond a 
> *partial*
> join order. Is that true? 

Yes, that's true.

> The comment ("ORDER BY S.A, T.B, S.C") raises the interesting question of the
> situation in which each column, considered individually, is ordered properly, 
> but
> because the ORDER BY clause interleaves columns from different tables, a sort 
> is
> still required.

Yes, it does.  And as I re-think about this, perhaps the code was written for a 
situation like

    "ORDER BY S.A, T.B"    -- Note that we do *NOT* have "S.C".

In that case a partial join order with "S" would satisfy the first order by 
column, and the second order by colum, "T.B", would have a table that is not in 
the join order.  Without the logic in question, I think the method would 
determine that a sort was required because table "T" wasn't found.  But the 
logic in question would see if there was anything 'after' T.B, and since there 
isn't, it would say that the partial join order can avoid a sort *so far*, with 
the assumption that if the next optimizable to be placed in the join order is 
"T", we might be able to avoid the sort entirely.

As soon as "S.C" gets added to the list, though, the logic sees that we have 
interleaving columns and therefore correctly requires a sort.

So *if* that's a correct statement of how the code is *supposed* to work, then 
it is actually quite useful and it does make sense.  But there seems to be a 
glitch in the logic--namely, it should perhaps require that a) all of the 
tables for the LEADING SET of order by columns, up to the one whose table 
cannot be found, MUST exist within the join order, *and* b) the leading set of 
order by columns canNOT be empty.  I think the code as written checks for "a", 
but it does not check for "b".

So in the query for this issue, we have "ORDER BY m0.value".  When we get a 
partial join order with { m1 } in it, we check the order by column "m0.value" 
and find that "m0" is not in the (partial) join order.  Today, due to the lack 
of condition "b", we think we can avoid the sort.  But if condition "b" was in 
place, we would see that the "leading set" of order by columns--i.e. the number 
of order by columns before "m0.value", is EMPTY, which means that "so far" 
nothing is sorted, and thus the sort would be required.

I haven't actually tried that out, I'm just writing as things occur to me, so 
this could be incomplete and/or entirely incorrect...

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>         Attachments: derby-reproduce.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Reply via email to