[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Mamta A. Satoor (JIRA) Thu, 21 May 2009 13:18:12 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711781#action_12711781
 ]


Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimizer code for following query
connect 'jdbc:derby:wombat';
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 


During the optimize phase, for an optimizable, everytime we are considering a 
new access path for it, we check if the conglomerate being considered for the 
optimizable is an index conglomerate. If yes, then we check if the current 
ordering for the given join order already has the index columns in it. That 
check is done by the following call in FromBaseTable.nextAccessPath: 478     
if ( ! rowOrdering.orderedOnColumn(isAscending[i] ? RowOrdering.ASCENDING : 
RowOrdering.DESCENDING,
        getTableNumber(), baseColumnPositions[i]))
The RowOrderingImpl.orderedOnColumn:119 first checks if the optimizable we are 
dealing with is marked as always ordered(I think this happens if the 
optimizable is a one-row table). If yes, then the we will return true from 
RowOrderingImpl.orderedOnColumn. If not, we check if there was a predicate on 
the index column which makes it always ordered(line 127). If not,then we check 
if the column is already being ordered by checking ordering vector at line 133. 
If not, we go ahead and add the index column to the ordering vector in 
currentRowOrdering object.

Going to the query above when the optimizer is going through the join order [0, 
-1] which means it is considering TENKTUP1 at the outermost join order position 
and no optimizable has yet been considered in the next position yet. For 
TENKTUP1, we go through the above piece of code. The method 
RowOrderingImpl.orderedOnColumn is going to return false for the index column 
UNIQUE1(index TK1UNIQUE1 is on that column) because 
a)TENKTUP1 is not always ordered meaning it is not one-row table
b)there is no constant predicate on TENKTUP1.UNIQUE1
c)there is no other ordering on TENKTUP1.UNIQUE1

Since for our query, the index column UNIQUE1 is not already ordered yet in the 
currentRowOrdering object, we go ahead and add it to the ordering vector inside 
currentRowOrdering object (this is done in FromBaseTable.nextAccessPath at 
line: 484 with following 2 code lines.)
        rowOrdering.nextOrderPosition(isAscending[i] ? RowOrdering.ASCENDING : 
RowOrdering.DESCENDING);
        rowOrdering.addOrderedColumn(isAscending[i] ? RowOrdering.ASCENDING : 
RowOrdering.DESCENDING,
                getTableNumber(), baseColumnPositions[i]);
This adding of TENKTUP1.UNIQUE1 to ordering object will indicate that the rows 
are ordered on that column. So, at the end of considering access path for [0, 
-1] join order position, currentRowOrdering object will have TENKTUP1.UNIQUE1 
in ordering vector and it will be marked as the current column ordering (this 
is done by having currentColumnOrdering = TENKTUP1.UNIQUE1). 

Next, we work on finding the cost of the given access path for [0, -1] join 
order. Once we find the cost, we check to see if it makes sense to avoid sort 
on it from what we know so far. This is done in 
OptimizerImpl.costBasedCostOptimizable through following piece of code
        if (considerSortAvoidance && 
requiredRowOrdering.sortRequired(currentRowOrdering,
                assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
What we are doing here is checking if the ordering requested by the user for 
the current table (that information is in requiredRowOrdering) is satisfied by 
row ordering provided by the join order being considered so far. The row 
ordering provided for the current join order is in currentRowOrdering. In our 
specific eg, since user has requested for ordering on TENKTUP1.UNIQUE1 and we 
are ordering on it because of the index that is being used, the above call to 
requiredRowOrdering.sortRequired is going to return sorting not required.

Next, we put the next optimizable in the join order, meaning moving from [0,-1] 
join order to [0, 1]. At this point, optimizer is considering TENKTUP2 in the 
2nd join order position. We have asked the optimizer to use index TK2UNIQUE1 
for optimizable TENKTUP2. We go through the same code path as above for this 
optimizable. We will find that the index column TENKTUP2.unique1 needs to be 
added to ordering vector in currentRowOrdering because TENKTUP2 is not one-row 
table and there is no constant predicate on TENKTUP2.unique1 and hence the 
current index being considered on TENKTUP2 is going to provide ordering on 
TENKTUP2.unique1. Next, we see if sort can be aovided for TENKTUP2 by comparing 
required row ordering against current row ordering. User has required ordering 
on TENKTUP2.unique1 and current row ordering satisfies that ordering because of 
the index which is being considered for TENKTUP2.

So, if my understanding of code is correct, the sorting is getting avoided NOT 
based on the fact that equality condition exists between the two optimizables 
ie TENKTUP1.unique1 = TENKTUP2.unique1 In other words, sorting avoidacne 
decision was not based on TENKTUP1.unique1 = TENKTUP2.unique1 It was because 
optimizer decides that the individual sorting required on the optimizables have 
been satisfied by indexes picked on them.

Hope this helps understand the current behavior of the trunk code for the query 
in question.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, 
> DERBY3926_notforcheckin_patch1_051109_diff.txt, 
> DERBY3926_notforcheckin_patch1_051109_stat.txt, 
> DERBY3926_notforcheckin_patch2_051109_diff.txt, 
> DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, 
> DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt, 
> script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large 
> column: VARCHAR(1024). I know it  is generally not a good idea to have an 
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column 
> but the result is not sorted. It is sorted if I remove the index on that 
> column.
> The output of the attached script is as follows (results should be ordered on 
> the middle column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value 
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the 
> ij-tool.
> Note that the result of the query becomes correct when enough data is 
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and 
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is 
> ignoring the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index

Reply via email to