Hey guys, I'm trying to use the Hudi CLI to connect to tables stored on S3 using the Glue metastore. Using a tip from Ashish M G <https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ> on Slack, I added the dependencies, re-built and was able to use the connect command to connect to the table, albeit with warnings:
hudi->connect --path s3a://bucketName/path.parquet 29597 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3a://bucketName/path.parquet WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance() WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 29785 [Spring Shell] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 31060 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01] 31380 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3a://bucketName/path.parquet/.hoodie/hoodie.properties 31455 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://bucketName/path.parquet Metadata for table tablename loaded However, many of the other commands seem to not be working properly: hudi:tablename->savepoints show ╔═══════════════╗ ║ SavepointTime ║ ╠═══════════════╣ ║ (empty) ║ ╚═══════════════╝ hudi:tablename->savepoint create Commit null not found in Commits org.apache.hudi.common.table.timeline.HoodieDefaultTimeline: [20200724220817__commit__COMPLETED] hudi:tablename->stats filesizes ╔════════════╤═══════╤═══════╤═══════╤═══════╤═══════╤═══════╤══════════╤════════╗ ║ CommitTime │ Min │ 10th │ 50th │ avg │ 95th │ Max │ NumFiles │ StdDev ║ ╠════════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪════════╣ ║ ALL │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0 │ 0.0 B ║ ╚════════════╧═══════╧═══════╧═══════╧═══════╧═══════╧═══════╧══════════╧════════╝ hudi:tablename->show fsview all 171314 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from s3a://bucketName/path.parquet 171362 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01] 171666 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from s3a://bucketName/path.parquet/.hoodie/hoodie.properties 171725 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://bucketName/path.parquet 171725 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for s3a://bucketName/path.parquet 171817 [Spring Shell] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200724220817__clean__COMPLETED], [20200724220817__commit__COMPLETED]] 172262 [Spring Shell] INFO org.apache.hudi.common.table.view.AbstractTableFileSystemView - addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5, StoreTimeTaken=2 ╔═══════════╤════════╤══════════════╤═══════════╤════════════════╤═════════════════╤═══════════════════════╤═════════════╗ ║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num Delta Files │ Total Delta File Size │ Delta Files ║ ╠═══════════╧════════╧══════════════╧═══════════╧════════════════╧═════════════════╧═══════════════════════╧═════════════╣ ║ (empty) ║ ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ I looked through the CLI code, and it seems that for true support we would need to add support for the different storage options hdfs/s3/azure/etc. in HoodieTableMetaClient. As from my understanding TableNotFoundException. checkTableValidity one of the first steps in this function checks just the hdfs filesystem. Could someone please clarify if this is something already supported and I'm just not configuring it correctly or if it's something that would need to be added and if the HoodieTableMetaClient is on the right track or not? Thanks,