[I] [Feature]: Add the self-optimizing.partition-filter parameter. [amoro]

via GitHub Thu, 23 Jan 2025 03:22:42 -0800


lintingbin opened a new issue, #3424:
URL: https://github.com/apache/amoro/issues/3424


   ### Description
   
   Specify the partition range for Amoro optimization by adding 
self-optimizing.partition-filter, similar to the function of the where 
parameter in Spark's rewrite_data_files procedures.
   
   ### Use case/motivation
   
   Currently, when Amoro optimizes Iceberg tables, it defaults to optimizing 
data from all partitions. However, this can bring about the following issues in 
practical use:
   
   High cost of optimizing historical data: Some tables' historical data may 
not conform to Amoro's optimization rules, and optimizing all historical data 
can lead to resource waste and performance degradation.
   
   Conflict between concurrent writing and optimization: Some historical 
partitions may have data repair operations involving deletions, which can take 
a relatively long duration. In such cases, it's preferable for Amoro to skip 
the optimization of these partitions to avoid conflicts.
   
   ### Describe the solution
   
   Add the self-optimizing.partition-filter parameter.
   
   ### Subtasks
   
   - [ ] Add the self-optimizing.partition-filter parameter. @lintingbin 
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature]: Add the self-optimizing.partition-filter parameter. [amoro]

Reply via email to