[ 
https://issues.apache.org/jira/browse/FLINK-28674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weike Dong updated FLINK-28674:
-------------------------------
    Description: 
Hi Devs,

Recently I have discovered that the _equaliser.equals_ call in 
_org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
 generates wrong comparison results when two binary rows are the same, like

!image-2022-07-25-20-56-14-111.png!

After digging through the generated code for this equaliser, I have found that 
when the two input RowData are all instances of {_}BinaryRowData{_}, the 
_BinaryRowData#equals_ method is directly called to give the comparison result. 

!image-2022-07-25-20-59-31-933.png!

However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot 
properly handle complex data types like {_}Timestamp{_}, so it returns _false_ 
even when the actual timestamp values are the same, causing 
SinkUpsertMaterializer to falsely think that there are no matches in the 
states, hence printing errors like "The state is cleared because of state ttl", 
which eventually leads to the loss of -U data in the final results.

  was:
Hi Devs,

Recently I have discovered that the _equaliser.equals_ call in 
_org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
 generates wrong comparison results when two binary rows are the same, like

!image-2022-07-25-20-56-14-111.png!

After digging through the generated code for this equaliser, I have found that 
when the two input RowData are all instances of {_}BinaryRowData{_}, the 
_BinaryRowData#equals_ method is directly called to give the comparison result.

!image-2022-07-25-20-59-31-933.png!

However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot 
properly handle complex data types like {_}Timestamp{_}, so it returns _false_ 
even when the actual timestamp values are the same, causing 
SinkUpsertMaterializer to falsely think that there are no matches in the 
states, hence printing errors like "The state is cleared because of state ttl", 
which eventually leads to the loss of -U data in the final results.


> EqualiserCodeGenerator generates wrong equaliser for Timestamp fields in 
> BinaryRowData
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-28674
>                 URL: https://issues.apache.org/jira/browse/FLINK-28674
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.13.6, 1.14.5, 1.15.1
>         Environment: Flink 1.13.6
>            Reporter: Weike Dong
>            Priority: Major
>         Attachments: image-2022-07-25-20-56-14-111.png, 
> image-2022-07-25-20-59-31-933.png
>
>
> Hi Devs,
> Recently I have discovered that the _equaliser.equals_ call in 
> _org.apache.flink.table.runtime.operators.sink.SinkUpsertMaterializer#removeFirst_
>  generates wrong comparison results when two binary rows are the same, like
> !image-2022-07-25-20-56-14-111.png!
> After digging through the generated code for this equaliser, I have found 
> that when the two input RowData are all instances of {_}BinaryRowData{_}, the 
> _BinaryRowData#equals_ method is directly called to give the comparison 
> result. 
> !image-2022-07-25-20-59-31-933.png!
> However, as you can see in the first snapshot, _BinaryRowData#equals_ cannot 
> properly handle complex data types like {_}Timestamp{_}, so it returns 
> _false_ even when the actual timestamp values are the same, causing 
> SinkUpsertMaterializer to falsely think that there are no matches in the 
> states, hence printing errors like "The state is cleared because of state 
> ttl", which eventually leads to the loss of -U data in the final results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to