[
https://issues.apache.org/jira/browse/CALCITE-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076191#comment-18076191
]
zzwqqq commented on CALCITE-7308:
---------------------------------
Thanks [~mbudiu] , that makes sense. I agree that the `VALUES` type inference
is probably not the root cause by itself, but it is one way to expose the issue.
I tried a smaller case with the type declared explicitly:
{code:java}
WITH bonus1(ENAME, JOB, SAL) AS (
VALUES (CAST('WARD' AS CHAR(5)), 'SALESMAN', 1250),
(CAST('ALLEN' AS CHAR(5)), 'SALESMAN', 1600)
)
SELECT bonus1.ename, bonus1.job, bonus1.sal,
emp.empno, emp.ename AS emp_ename
FROM bonus1
INNER JOIN emp ON bonus1.ename = emp.ename
{code}
This returns only ALLEN, not WARD. The plan is:
{code:java}
EnumerableHashJoin(condition=[=($3, $5)], joinType=[inner])
EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):VARCHAR(10) NOT NULL],
proj#0..3=[{exprs}])
EnumerableValues(tuples=[[{ 'WARD ', 'SALESMAN', 1250 }, { 'ALLEN',
'SALESMAN', 1600 }]])
EnumerableCalc(expr#0..7=[{inputs}], proj#0..1=[{exprs}])
EnumerableTableScan(table=[[scott, EMP]]){code}
So the CHAR(5) value is padded to 'WARD ', and the implicit cast to VARCHAR(10)
keeps the trailing space.
>From the enumerable implementation, EnumerableHashJoin passes a comparer to
>hashJoin like this:
{code:java}
Util.first(keyPhysType.comparer(), Expressions.constant(null));{code}
For this plan the join key is a single field, so keyPhysType ends up using
JavaRowFormat.SCALAR. That format does not define a custom comparer, so
keyPhysType.comparer() is null.
Then EnumerableDefaults.hashEquiJoin_ falls back to
inner.toLookup(innerKeySelector), backed by a HashMap. So at runtime the join
key is compared like a normal Java string key: 'WARD' and 'WARD ' are different.
So I would describe it this way: VALUES inference is the trigger in the
original query, but after the value has become padded CHAR(5), the enumerable
implementation casts it to VARCHAR(10) without trimming and the hash join
compares the resulting Java string keys as-is. So 'WARD ' does not match 'WARD'.
> Incorrect Result (Missing Row) When Joining a CTE (Temporary Table) with a
> Persistent Table
> -------------------------------------------------------------------------------------------
>
> Key: CALCITE-7308
> URL: https://issues.apache.org/jira/browse/CALCITE-7308
> Project: Calcite
> Issue Type: Bug
> Reporter: weihua zhang
> Priority: Major
>
> When executing a JOIN operation between a CTE (Common Table Expression,
> acting as a temporary table) and a persistent table, Calcite returns
> incorrect results with one row missing. The expected result should include
> all matching rows from both tables, but the actual output omits one valid
> matching row.
> {code:sql}
> !use scott
> WITH bonus1(ENAME, JOB, SAL, COMM) AS (
> VALUES
> ('ALLEN', 'SALESMAN', 1600.00, 300.00),
> ('WARD', 'SALESMAN', 1250.00, 500.00)
> )
> SELECT * FROM bonus1
> INNER JOIN emp -- `emp` is a persistent table (predefined in the database)
> ON bonus1.ename = emp.ename;
> {code}
> {code:java}
> ENAME, JOB, SAL, COMM, EMPNO, ENAME0, JOB0, MGR, HIREDATE, SAL0, COMM0, DEPTNO
> ALLEN, SALESMAN, 1600.00, 300.00, 7499, ALLEN, SALESMAN, 7698, 1981-02-20,
> 1600.00, 300.00, 30
> !ok
> {code}
> but right result is:
> {code:java}
> ename | job | sal | comm | empno | ename | job | mgr |
> hiredate | sal | comm | deptno
> -------+----------+---------+--------+-------+-------+----------+------+------------+---------+--------+--------+
> WARD | SALESMAN | 1250.00 | 500.00 | 7521 | WARD | SALESMAN | 7698 |
> 1981-02-22 | 1250.00 | 500.00 | 30 |
> ALLEN | SALESMAN | 1600.00 | 300.00 | 7499 | ALLEN | SALESMAN | 7698 |
> 1981-02-20 | 1600.00 | 300.00 | 30 |
> (2 rows)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)