Yuxin Tan created FLINK-35690:
---------------------------------

             Summary: Release Testing: Verify FLIP-459: Support Flink hybrid 
shuffle integration with Apache Celeborn
                 Key: FLINK-35690
                 URL: https://issues.apache.org/jira/browse/FLINK-35690
             Project: Flink
          Issue Type: Sub-task
            Reporter: Yuxin Tan
             Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533

In Flink 1.20,  we proposed integrating Flink's Hybrid Shuffle with Apache 
Celeborn through a pluggable remote tier interface. To verify this feature, you 
should reference these main two steps.

1. Implement Celeborn tier.
 * Implement a new tier factory and tier for Celeborn, including these APIs, 
including TierFactory/TierMasterAgent/TierProducerAgent/TierConsumerAgent.
 * The implementations should support granular data management at the Segment 
level for both client and server sides.

2. Use the implemented tier to shuffle data.
 * Compile Flink and Celeborn.
 * Deploy Celeborn service
 ** Deploy a new Celeborn service with the new compiled packages. You can 
reference the doc (https://celeborn.apache.org/docs/latest/) to deploy the 
cluster. 
 * Add the compiled flink plugin jar (celeborn-client-flink-xxx.jar) to Flink 
classpaths.
 * Configure the options to enable the feature.
 ** Configure the option 
taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class to the 
new Celeborn tier classes. Except for this option, the following options should 
also be added.

 
{code:java}
execution.batch-shuffle-mode: ALL_EXCHANGES_HYBRID_FULL 
celeborn.master.endpoints: <the celeborn endpoint address>
celeborn.client.shuffle.partition.type: MAP\{code}
 * Run some test examples(e.g., WordCount) to verify the feature.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to