bryanburke opened a new issue #3641: URL: https://github.com/apache/hudi/issues/3641
**Describe the problem you faced** I am not experiencing a problem. I would however like to request advice/peer review to ensure I am using the Hudi Java classes and methods in the most appropriate manner. Goal: Retrieve the timestamp of the latest completed commit in a Hudi table, loading only Hudi metadata files from S3 in the process. Sample code below in the **To Reproduce** is the approach I am using to accomplish this goal in a PySpark ETL script via HoodieTableMetaClient. The overall idea is to save the timestamp of the latest completed commit on the source Hudi table as a bookmark so the next ETL script run can process only the incremental changes after that point. General questions: - Is this approach valid? If not, what alternative do you suggest? - Do the Hudi classes and methods I use have relatively stable public interfaces that are not likely to change significantly over time? - As development progresses, are there any plans to expose parts of Hudi's API via Python? I appreciate your time and expertise! Thanks for creating and maintaining this incredible framework! **To Reproduce** Sample code: ```python # sc already exists within the PySpark session. source_path = "s3a://example-bucket/example-table/" # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java client = ( sc._jvm .org.apache.hudi.common.table.HoodieTableMetaClient .builder() .setConf(sc._jsc.hadoopConfiguration()) .setBasePath(source_path) .build() ) # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java timeline = client.getCommitsTimeline().filterCompletedInstants() # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/util/Option.java # https://github.com/apache/hudi/blob/release-0.9.0/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java last_instant = timeline.lastInstant().orElse(None) if last_instant: last_processed = last_instant.getTimestamp() ``` **Environment Description** * Hudi version : 0.9.0 * Spark version : 3.1.1 * Hive version : 2.3.7 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : S3A * Running on Docker? (yes/no) : yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org