Hey guys,
I'm trying to use the Hudi CLI to connect to tables stored on S3 using the
Glue metastore. Using a tip from Ashish M G
<https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ>
on Slack, I added the dependencies, re-built and was able to use the
connect command to connect to the table, albeit with warnings:

hudi->connect --path s3a://bucketName/path.parquet

29597 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
sun.security.krb5.Config.getInstance()

WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil

WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations

WARNING: All illegal access operations will be denied in a future release

29785 [Spring Shell] WARN  org.apache.hadoop.util.NativeCodeLoader  -
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable

31060 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]

31380 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties

31455 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet

Metadata for table tablename loaded

However, many of the other commands seem to not be working properly:

hudi:tablename->savepoints show

╔═══════════════╗

║ SavepointTime ║

╠═══════════════╣

║ (empty)       ║

╚═══════════════╝

hudi:tablename->savepoint create

Commit null not found in Commits
org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
[20200724220817__commit__COMPLETED]


hudi:tablename->stats filesizes

╔════════════╤═══════╤═══════╤═══════╤═══════╤═══════╤═══════╤══════════╤════════╗

║ CommitTime │ Min   │ 10th  │ 50th  │ avg   │ 95th  │ Max   │ NumFiles │
StdDev ║

╠════════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪════════╣

║ ALL        │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0        │
0.0 B  ║

╚════════════╧═══════╧═══════╧═══════╧═══════╧═══════╧═══════╧══════════╧════════╝


hudi:tablename->show fsview all

171314 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet

171362 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]

171666 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties

171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet

171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit
timeline for s3a://bucketName/path.parquet

171817 [Spring Shell] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded
instants [[20200724220817__clean__COMPLETED],
[20200724220817__commit__COMPLETED]]

172262 [Spring Shell] INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView  -
addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5,
StoreTimeTaken=2

╔═══════════╤════════╤══════════════╤═══════════╤════════════════╤═════════════════╤═══════════════════════╤═════════════╗

║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num
Delta Files │ Total Delta File Size │ Delta Files ║

╠═══════════╧════════╧══════════════╧═══════════╧════════════════╧═════════════════╧═══════════════════════╧═════════════╣

║ (empty)
                                              ║

╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

I looked through the CLI code, and it seems that for true support we would
need to add support for the different storage options hdfs/s3/azure/etc. in
HoodieTableMetaClient. As from my understanding TableNotFoundException.
checkTableValidity one of the first steps in this function checks just the
hdfs filesystem.

Could someone please clarify if this is something already supported and I'm
just not configuring it correctly or if it's something that would need to
be added and if the HoodieTableMetaClient is on the right track or not?

Thanks,

Reply via email to