[ 
https://issues.apache.org/jira/browse/FLINK-20986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271273#comment-17271273
 ] 

Tianshi Zhu commented on FLINK-20986:
-------------------------------------

[~twalthr] parts of the first job generated by the old class loader are kept in 
calcite. They differ from the Class objects generated by the other class loader 
used for the second job, and cause the second job's sql validation to fail. I 
was working on something else last week, so didn't look into why two jobs will 
use different class loaders. Also creating two different class loaders in a 
unit test isn't a trivial work.

I agree that fixing `GenericTypeInfo` probably would not fix the root cause. 
But I don't think the cache is the problem here either.

One problem is for the Interner, intern(a).equals(a) doesn't hold when a is a 
RelDataType that contains `GenericTypeInfo` with different Class objects, and 
maybe there are other TypeInformation classes that may break the law 
(https://guava.dev/releases/20.0/api/docs/com/google/common/collect/Interner.html#intern-E-).
 

The other problem is for the loadingCache, different RelDataTypes may produce 
the same key.

At least those two problems are what I observed. My hack is to clean up the two 
caches manually (using reflection...) before running each job. But I wish we 
could fix the root causes.

> GenericTypeInfo equality issue
> ------------------------------
>
>                 Key: FLINK-20986
>                 URL: https://issues.apache.org/jira/browse/FLINK-20986
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.12.0
>            Reporter: Tianshi Zhu
>            Priority: Major
>              Labels: pull-request-available
>
> When trying to use Flink REST api to run a job that uses Flink table api with 
> blink planner, we encountered an issue about `Incompatible types of 
> expression and result type.` from 
> org.apache.flink.table.planner.codegen.ExprCodeGenerator$$anonfun$generateResultExpression$1.apply(ExprCodeGenerator.scala:311).
>  This issue only happens after the first request has been handled 
> successfully.
>  
> After digging, we found that there are two static caches used inside calcite's
> RelDataTypeFactoryImpl (
> https://github.com/apache/calcite/blob/d9a81b88ad561e7e4cedae93e805e0d7a53a7f1a/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactoryImpl.java#L352-L376
> ), which will remember the types they have seen. The `canonize` method is 
> called from FlinkTypeFactory 
> https://github.com/apache/flink/blob/89f9dcd70dc3a1433055e17775b2b2a2c796ca94/flink-table/flink-table-planner-blink/src/main/scala/org/apache/flink/table/planner/calcite/FlinkTypeFactory.scala#L292
>  
> This causes problem for us because in our experience, we have seen 
> GenericTypeInfo<T> containing different Class<T> instances in two different 
> REST requests, and they do not equal, because 
> [https://github.com/apache/flink/blob/89f9dcd70dc3a1433055e17775b2b2a2c796ca94/flink-core/src/main/java/org/apache/flink/api/java/typeutils/GenericTypeInfo.java#L124]
>  is using object equality. After `canonize`, the GenericTypeInfo for other 
> REST requests would be changed to the GenericTypeInfo used for the first REST 
> request, which is cached in RelDataTypeFactoryImpl. And this leads to the 
> incompatible type error mentioned above.
>  
> I want to propose using class name for equality comparison inside 
> GenericTypeInfo, and change hashCode method accordingly.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to