rangareddy commented on issue #14889: URL: https://github.com/apache/hudi/issues/14889#issuecomment-3664318714
We have implemented the [HoodieRepairTool](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieRepairTool.java), a Spark-based utility designed to repair Hudi tables by identifying and removing dangling base and log files. **Example Command:** ```sh spark-submit \ --class org.apache.hudi.utilities.HoodieRepairTool \ --driver-memory 4g \ --executor-memory 1g \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.catalogImplementation=hive \ --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \ $HUDI_DIR/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.11.0-SNAPSHOT.jar \ --mode repair \ --base-path base_path \ --backup-path backup_path \ --start-instant-time ts1 \ --end-instant-time ts2 \ --assume-date-partitioning ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
