Hi Adam, I have not used the CLI tool much, but s3 filesystem is already supported in Hudi. You may check the following class to see the list of file systems already supported - https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java .
On Wed, Sep 9, 2020 at 6:46 AM Adam <[email protected]> wrote: > Hey guys, > I'm trying to use the Hudi CLI to connect to tables stored on S3 using the > Glue metastore. Using a tip from Ashish M G > < > https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ > > > on Slack, I added the dependencies, re-built and was able to use the > connect command to connect to the table, albeit with warnings: > > hudi->connect --path s3a://bucketName/path.parquet > > 29597 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableMetaClient - Loading > HoodieTableMetaClient from s3a://bucketName/path.parquet > > WARNING: An illegal reflective access operation has occurred > > WARNING: Illegal reflective access by > org.apache.hadoop.security.authentication.util.KerberosUtil > (file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method > sun.security.krb5.Config.getInstance() > > WARNING: Please consider reporting this to the maintainers of > org.apache.hadoop.security.authentication.util.KerberosUtil > > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > > WARNING: All illegal access operations will be denied in a future release > > 29785 [Spring Shell] WARN org.apache.hadoop.util.NativeCodeLoader - > Unable to load native-hadoop library for your platform... using > builtin-java classes where applicable > > 31060 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop > Configuration: fs.defaultFS: [file:///], Config:[Configuration: > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], > FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01] > > 31380 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableConfig > - > Loading table properties from > s3a://bucketName/path.parquet/.hoodie/hoodie.properties > > 31455 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading > Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from > s3a://bucketName/path.parquet > > Metadata for table tablename loaded > > However, many of the other commands seem to not be working properly: > > hudi:tablename->savepoints show > > ╔═══════════════╗ > > ║ SavepointTime ║ > > ╠═══════════════╣ > > ║ (empty) ║ > > ╚═══════════════╝ > > hudi:tablename->savepoint create > > Commit null not found in Commits > org.apache.hudi.common.table.timeline.HoodieDefaultTimeline: > [20200724220817__commit__COMPLETED] > > > hudi:tablename->stats filesizes > > > ╔════════════╤═══════╤═══════╤═══════╤═══════╤═══════╤═══════╤══════════╤════════╗ > > ║ CommitTime │ Min │ 10th │ 50th │ avg │ 95th │ Max │ NumFiles │ > StdDev ║ > > > ╠════════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪════════╣ > > ║ ALL │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0 │ > 0.0 B ║ > > > ╚════════════╧═══════╧═══════╧═══════╧═══════╧═══════╧═══════╧══════════╧════════╝ > > > hudi:tablename->show fsview all > > 171314 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableMetaClient - Loading > HoodieTableMetaClient from s3a://bucketName/path.parquet > > 171362 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop > Configuration: fs.defaultFS: [file:///], Config:[Configuration: > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], > FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01] > > 171666 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableConfig - > Loading table properties from > s3a://bucketName/path.parquet/.hoodie/hoodie.properties > > 171725 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading > Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from > s3a://bucketName/path.parquet > > 171725 [Spring Shell] INFO > org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit > timeline for s3a://bucketName/path.parquet > > 171817 [Spring Shell] INFO > org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded > instants [[20200724220817__clean__COMPLETED], > [20200724220817__commit__COMPLETED]] > > 172262 [Spring Shell] INFO > org.apache.hudi.common.table.view.AbstractTableFileSystemView - > addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5, > StoreTimeTaken=2 > > > ╔═══════════╤════════╤══════════════╤═══════════╤════════════════╤═════════════════╤═══════════════════════╤═════════════╗ > > ║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num > Delta Files │ Total Delta File Size │ Delta Files ║ > > > ╠═══════════╧════════╧══════════════╧═══════════╧════════════════╧═════════════════╧═══════════════════════╧═════════════╣ > > ║ (empty) > ║ > > > ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝ > > I looked through the CLI code, and it seems that for true support we would > need to add support for the different storage options hdfs/s3/azure/etc. in > HoodieTableMetaClient. As from my understanding TableNotFoundException. > checkTableValidity one of the first steps in this function checks just the > hdfs filesystem. > > Could someone please clarify if this is something already supported and I'm > just not configuring it correctly or if it's something that would need to > be added and if the HoodieTableMetaClient is on the right track or not? > > Thanks, >
