[
https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711781#action_12711781
]
Mamta A. Satoor commented on DERBY-3926:
----------------------------------------
I went through the optimizer code for following query
connect 'jdbc:derby:wombat';
select * from --DERBY-PROPERTIES joinOrder=FIXED
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1
where TENKTUP1.unique1 = TENKTUP2.unique1
order by TENKTUP1.unique1, TENKTUP2.unique1;
During the optimize phase, for an optimizable, everytime we are considering a
new access path for it, we check if the conglomerate being considered for the
optimizable is an index conglomerate. If yes, then we check if the current
ordering for the given join order already has the index columns in it. That
check is done by the following call in FromBaseTable.nextAccessPath: 478
if ( ! rowOrdering.orderedOnColumn(isAscending[i] ? RowOrdering.ASCENDING :
RowOrdering.DESCENDING,
getTableNumber(), baseColumnPositions[i]))
The RowOrderingImpl.orderedOnColumn:119 first checks if the optimizable we are
dealing with is marked as always ordered(I think this happens if the
optimizable is a one-row table). If yes, then the we will return true from
RowOrderingImpl.orderedOnColumn. If not, we check if there was a predicate on
the index column which makes it always ordered(line 127). If not,then we check
if the column is already being ordered by checking ordering vector at line 133.
If not, we go ahead and add the index column to the ordering vector in
currentRowOrdering object.
Going to the query above when the optimizer is going through the join order [0,
-1] which means it is considering TENKTUP1 at the outermost join order position
and no optimizable has yet been considered in the next position yet. For
TENKTUP1, we go through the above piece of code. The method
RowOrderingImpl.orderedOnColumn is going to return false for the index column
UNIQUE1(index TK1UNIQUE1 is on that column) because
a)TENKTUP1 is not always ordered meaning it is not one-row table
b)there is no constant predicate on TENKTUP1.UNIQUE1
c)there is no other ordering on TENKTUP1.UNIQUE1
Since for our query, the index column UNIQUE1 is not already ordered yet in the
currentRowOrdering object, we go ahead and add it to the ordering vector inside
currentRowOrdering object (this is done in FromBaseTable.nextAccessPath at
line: 484 with following 2 code lines.)
rowOrdering.nextOrderPosition(isAscending[i] ? RowOrdering.ASCENDING :
RowOrdering.DESCENDING);
rowOrdering.addOrderedColumn(isAscending[i] ? RowOrdering.ASCENDING :
RowOrdering.DESCENDING,
getTableNumber(), baseColumnPositions[i]);
This adding of TENKTUP1.UNIQUE1 to ordering object will indicate that the rows
are ordered on that column. So, at the end of considering access path for [0,
-1] join order position, currentRowOrdering object will have TENKTUP1.UNIQUE1
in ordering vector and it will be marked as the current column ordering (this
is done by having currentColumnOrdering = TENKTUP1.UNIQUE1).
Next, we work on finding the cost of the given access path for [0, -1] join
order. Once we find the cost, we check to see if it makes sense to avoid sort
on it from what we know so far. This is done in
OptimizerImpl.costBasedCostOptimizable through following piece of code
if (considerSortAvoidance &&
requiredRowOrdering.sortRequired(currentRowOrdering,
assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
What we are doing here is checking if the ordering requested by the user for
the current table (that information is in requiredRowOrdering) is satisfied by
row ordering provided by the join order being considered so far. The row
ordering provided for the current join order is in currentRowOrdering. In our
specific eg, since user has requested for ordering on TENKTUP1.UNIQUE1 and we
are ordering on it because of the index that is being used, the above call to
requiredRowOrdering.sortRequired is going to return sorting not required.
Next, we put the next optimizable in the join order, meaning moving from [0,-1]
join order to [0, 1]. At this point, optimizer is considering TENKTUP2 in the
2nd join order position. We have asked the optimizer to use index TK2UNIQUE1
for optimizable TENKTUP2. We go through the same code path as above for this
optimizable. We will find that the index column TENKTUP2.unique1 needs to be
added to ordering vector in currentRowOrdering because TENKTUP2 is not one-row
table and there is no constant predicate on TENKTUP2.unique1 and hence the
current index being considered on TENKTUP2 is going to provide ordering on
TENKTUP2.unique1. Next, we see if sort can be aovided for TENKTUP2 by comparing
required row ordering against current row ordering. User has required ordering
on TENKTUP2.unique1 and current row ordering satisfies that ordering because of
the index which is being considered for TENKTUP2.
So, if my understanding of code is correct, the sorting is getting avoided NOT
based on the fact that equality condition exists between the two optimizables
ie TENKTUP1.unique1 = TENKTUP2.unique1 In other words, sorting avoidacne
decision was not based on TENKTUP1.unique1 = TENKTUP2.unique1 It was because
optimizer decides that the individual sorting required on the optimizables have
been satisfied by indexes picked on them.
Hope this helps understand the current behavior of the trunk code for the query
in question.
> Incorrect ORDER BY caused by index
> ----------------------------------
>
> Key: DERBY-3926
> URL: https://issues.apache.org/jira/browse/DERBY-3926
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
> Reporter: Tars Joris
> Assignee: Mamta A. Satoor
> Attachments: d3926_repro.sql, derby-reproduce.zip,
> DERBY3926_notforcheckin_patch1_051109_diff.txt,
> DERBY3926_notforcheckin_patch1_051109_stat.txt,
> DERBY3926_notforcheckin_patch2_051109_diff.txt,
> DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt,
> DERBY3926_patch4_051519_diff.txt, DERBY3926_patch4_051519_stat.txt,
> script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large
> column: VARCHAR(1024). I know it is generally not a good idea to have an
> index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column
> but the result is not sorted. It is sorted if I remove the index on that
> column.
> The output of the attached script is as follows (results should be ordered on
> the middle column):
> ID |VALUE |VALUE
> ----------------------------------------------
> 2147483653 |000002 |21857
> 2147483654 |000003 |21857
> 4294967297 |000001 |21857
> While I would expect:
> ID |VALUE |VALUE
> ----------------------------------------------
> 4294967297 |000001 |21857
> 2147483653 |000002 |21857
> 2147483654 |000003 |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value
> VARCHAR(1024), PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the
> ij-tool.
> Note that the result of the query becomes correct when enough data is
> changed. This prevented me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and
> script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is
> ignoring the order by clause."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.