steFaiz opened a new pull request, #7324:
URL: https://github.com/apache/paimon/pull/7324

   <!-- Please specify the module before the PR name: [core] ... or [flink] ... 
-->
   
   ### Purpose
   This PR optimize Data-Evolution-Merge-Into in several aspects:
   1. do not use Flink's committerOperator, it has some problems: on each 
`endInput`, this operator will call `FilterAndCommit`, which will filter out 
every Committable whose id is less or equal to latest snapshot's. In our 
implementation, the latest committed id is always 'END_INPUT' i.e. 
Long.MAX_VALUE. We use batchTableCommit directly instead.
   2. Use calcite to rename target table (current implementation is regex, 
which is very unstable)
   3. Use calcite to find _row_id field (if exists) in source table. We can 
eliminate join process.
   
   <!-- Linking this pull request to the issue -->
   Linked issue: None
   
   <!-- What is the purpose of the change -->
   
   ### Tests
   Please see: 
`org.apache.paimon.flink.action.DataEvolutionMergeIntoActionITCase`.
   <!-- List UT and IT cases to verify this change -->
   
   ### API and Format
   None
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   None
   <!-- Does this change introduce a new feature -->
   
   ### Generative AI tooling
   Full hand-writing.
   <!--
   If generative AI tooling has been used in the process of authoring this 
patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to