Eason Ye created FLINK-36371:
--------------------------------
Summary: Field BinaryStringData as keySelector would lead the same
records hash shuffle into different tumble windows
Key: FLINK-36371
URL: https://issues.apache.org/jira/browse/FLINK-36371
Project: Flink
Issue Type: Improvement
Components: API / Type Serialization System
Affects Versions: 2.0.0
Reporter: Eason Ye
The pipeline is Source(Kafka serialize msg to RowData type) -> FlatMap
--(keyby)–> TumbleProcessWindow -> sink.
The Source would emit two fields: UserName String, and Amount Integer.
FlatMap collects the BinaryRowData type record and emits it to the
TumbleProcessWindow operator.
KeySelector is using the UserName BinaryStringData. key by code is :
{code:java}
.keyBy(rowData -> rowData.getString(0)){code}
The window result is unexpected, the same username records arrived at
TumbleProcessWindow simultaneously, but these records were calculated in the
different windows.
When I use the below keyBy, the window result is correct.
{code:java}
.keyBy(rowData -> rowData.getString(0).toString()){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)