lintingbin opened a new issue, #3424: URL: https://github.com/apache/amoro/issues/3424
### Description Specify the partition range for Amoro optimization by adding self-optimizing.partition-filter, similar to the function of the where parameter in Spark's rewrite_data_files procedures. ### Use case/motivation Currently, when Amoro optimizes Iceberg tables, it defaults to optimizing data from all partitions. However, this can bring about the following issues in practical use: High cost of optimizing historical data: Some tables' historical data may not conform to Amoro's optimization rules, and optimizing all historical data can lead to resource waste and performance degradation. Conflict between concurrent writing and optimization: Some historical partitions may have data repair operations involving deletions, which can take a relatively long duration. In such cases, it's preferable for Amoro to skip the optimization of these partitions to avoid conflicts. ### Describe the solution Add the self-optimizing.partition-filter parameter. ### Subtasks - [ ] Add the self-optimizing.partition-filter parameter. @lintingbin ### Related issues _No response_ ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
