[ https://issues.apache.org/jira/browse/PHOENIX-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated PHOENIX-6232: ------------------------------ Attachment: PHOENIX-6232_addendum-master.diff > Correlated subquery should not push to RegionServer as the probe side of the > Hash join > -------------------------------------------------------------------------------------- > > Key: PHOENIX-6232 > URL: https://issues.apache.org/jira/browse/PHOENIX-6232 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.15.0 > Reporter: Mate Szalay-Beko > Assignee: chenglei > Priority: Major > Fix For: 5.1.0, 4.16.0 > > Attachments: PHOENIX-6232_addendum-4.x.patch, > PHOENIX-6232_addendum-master.diff, PHOENIX-6232_v1-4.x.patch, > PHOENIX-6232_v1-master.patch > > > We were facing an interesting problem when a more complex query (with inner > selects in the WHERE clause) succeeds alone, while the same query fails, if > it is part of a join. I created a test table / query to reproduce the problem: > {code:sql} > DROP TABLE IF EXISTS test; > CREATE TABLE test ( > id INTEGER NOT NULL, > test_id INTEGER, > lastchanged TIMESTAMP, > CONSTRAINT my_pk PRIMARY KEY (id)); > UPSERT INTO test VALUES(0, 100, '2000-01-01 00:00:00.0'); > UPSERT INTO test VALUES(1, 101, '2000-01-01 00:00:00.0'); > UPSERT INTO test VALUES(2, 100, '2011-11-11 11:11:11.0'); > {code} > *Query 1:* Example query, running fine in itself: > {code:sql} > SELECT id, test_id, lastchanged FROM test T > WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id = > T.test_id ) > Returns: > +----+---------+-----------------------+ > | ID | TEST_ID | LASTCHANGED | > +----+---------+-----------------------+ > | 1 | 101 | 2000-01-01 01:00:00.0 | > | 2 | 100 | 2011-11-11 12:11:11.0 | > +----+---------+-----------------------+ > {code} > *Query 2:* Same query fails on the current master branch, when it is part of > a larger (implicit) join: > {code:sql} > SELECT AAA.* > FROM > ( > SELECT id, test_id, lastchanged FROM test T > WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id = > T.test_id ) > ) as AAA, > ( > SELECT id FROM test > ) as BBB > WHERE AAA.id = BBB.id; > java.lang.IllegalArgumentException > at > org.apache.phoenix.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:128) > at > org.apache.phoenix.compile.TupleProjectionCompiler.createProjectedTable(TupleProjectionCompiler.java:66) > at > org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:663) > at > org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:404) > at > org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:302) > at > org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:249) > at > org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:176) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:504) > at > org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:467) > at > org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309) > at > org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:298) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:297) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:290) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1933) > at sqlline.Commands.executeSingleQuery(Commands.java:1054) > at sqlline.Commands.execute(Commands.java:1003) > at sqlline.Commands.sql(Commands.java:967) > at sqlline.SqlLine.dispatch(SqlLine.java:734) > at sqlline.SqlLine.begin(SqlLine.java:541) > at sqlline.SqlLine.start(SqlLine.java:267) > at sqlline.SqlLine.main(SqlLine.java:206) > {code} > I am not sure what the problem is exactly. My guess is that Phoenix tries to > optimize (flatten) an inner-query, which it shouldn't, if we are inside a > join (according to the check in the code which throws the exception). > The best workaround I found was to define an explicit join in the original > query (Query 1), basically change the inner select into a join. This modified > query return the same as the original one: > *Query 3:* > {code:sql} > SELECT T.id, T.test_id, T.lastchanged > FROM > test T > LEFT JOIN ( > SELECT max(lastchanged) AS max_timestamp, > test_id AS max_timestamp_test_id > FROM test > GROUP BY test_id > ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id > WHERE T.lastchanged = JOIN_TABLE.max_timestamp > Returns: > +------+-----------+-----------------------+ > | T.ID | T.TEST_ID | T.LASTCHANGED | > +------+-----------+-----------------------+ > | 1 | 101 | 2000-01-01 01:00:00.0 | > | 2 | 100 | 2011-11-11 12:11:11.0 | > +------+-----------+-----------------------+ > {code} > *Query 4:* And the same modified query (query 3) now works inside a join: > {code:sql} > SELECT AAA.* > FROM > ( > SELECT T.id, T.test_id, T.lastchanged > FROM > test T > LEFT JOIN ( > SELECT max(lastchanged) AS max_timestamp, > test_id AS max_timestamp_test_id > FROM test > GROUP BY test_id > ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id > WHERE T.lastchanged = JOIN_TABLE.max_timestamp > ) as AAA, > ( > SELECT id FROM test > ) as BBB > WHERE AAA.id = BBB.id; > Returns: > +------+-----------+-----------------------+ > | T.ID | T.TEST_ID | T.LASTCHANGED | > +------+-----------+-----------------------+ > | 1 | 101 | 2000-01-01 01:00:00.0 | > | 2 | 100 | 2011-11-11 12:11:11.0 | > +------+-----------+-----------------------+ > {code} > I think Query 4 worked, as it is forcing Phoenix to drop the idea of > optimizing it's inner-query (Query 3). Although, I can be wrong about the > root cause... > Anyway, I think the bug should be fixed and Query 2 should run without > exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)