[ 
https://issues.apache.org/jira/browse/FLINK-38817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052301#comment-18052301
 ] 

Zhu Zhu commented on FLINK-38817:
---------------------------------

 
The proposed solution in the doc converts `ForwardForUnspecifiedPartitioner` to 
`ForwardPartitioner` if the producer task has any input using a global or 
forward partitioner. I think this approach is not correct, as it fails to 
account for scenarios where the output of such a producer may legitimately need 
to be shuffled across multiple parallel tasks, beyond just the `Sort/SortLimit 
+ Sink` pattern.

Only the SQL planner has sufficient context to determine whether data must 
remain in a single partition to preserve ordering. The root issue lies in the 
Flink runtime’s lack of awareness that certain data streams must not be 
redistributed across multiple tasks to maintain order.

Therefore, I think the solution would be for the SQL planner to explicitly 
constrain the maxParallelism of downstream operators (e.g., after Sort or 
SortLimit) to 1 when order preservation is required.
 

> Out of order data seen while running tpc-ds queries
> ---------------------------------------------------
>
>                 Key: FLINK-38817
>                 URL: https://issues.apache.org/jira/browse/FLINK-38817
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 2.2.0
>            Reporter: Bonnie Varghese
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> All unspecified edges are converted to Rescale edges by default for dynamic 
> graphs. Related Jira - https://issues.apache.org/jira/browse/FLINK-25046
> While testing tpc-ds queries I observed that after a global operation the 
> order of the global operation is not preserved due to Rescale edges.
> For SQL batch to work correctly, we should keep Forward edges after a global 
> operation such as `SortLimit` or `Sort `to obtain data correctness and 
> avoiding out of order data.
> I have put my observations and experiments in this doc here:
> [https://docs.google.com/document/d/1TTj2ddlQTfDgtGb0ISmiKWt6R9U4RxJ59o6bULC1YtI/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to