Hi Adam,

I have not used the CLI tool much, but s3 filesystem is already supported
in Hudi. You may check the following class to see the list of file systems
already supported -
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
.

On Wed, Sep 9, 2020 at 6:46 AM Adam <[email protected]> wrote:

> Hey guys,
> I'm trying to use the Hudi CLI to connect to tables stored on S3 using the
> Glue metastore. Using a tip from Ashish M G
> <
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ
> >
> on Slack, I added the dependencies, re-built and was able to use the
> connect command to connect to the table, albeit with warnings:
>
> hudi->connect --path s3a://bucketName/path.parquet
>
> 29597 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> HoodieTableMetaClient from s3a://bucketName/path.parquet
>
> WARNING: An illegal reflective access operation has occurred
>
> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
> sun.security.krb5.Config.getInstance()
>
> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
>
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
>
> WARNING: All illegal access operations will be denied in a future release
>
> 29785 [Spring Shell] WARN  org.apache.hadoop.util.NativeCodeLoader  -
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
>
> 31060 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
>
> 31380 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig
> -
> Loading table properties from
> s3a://bucketName/path.parquet/.hoodie/hoodie.properties
>
> 31455 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> s3a://bucketName/path.parquet
>
> Metadata for table tablename loaded
>
> However, many of the other commands seem to not be working properly:
>
> hudi:tablename->savepoints show
>
> ╔═══════════════╗
>
> ║ SavepointTime ║
>
> ╠═══════════════╣
>
> ║ (empty)       ║
>
> ╚═══════════════╝
>
> hudi:tablename->savepoint create
>
> Commit null not found in Commits
> org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
> [20200724220817__commit__COMPLETED]
>
>
> hudi:tablename->stats filesizes
>
>
> ╔════════════╤═══════╤═══════╤═══════╤═══════╤═══════╤═══════╤══════════╤════════╗
>
> ║ CommitTime │ Min   │ 10th  │ 50th  │ avg   │ 95th  │ Max   │ NumFiles │
> StdDev ║
>
>
> ╠════════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪════════╣
>
> ║ ALL        │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0        │
> 0.0 B  ║
>
>
> ╚════════════╧═══════╧═══════╧═══════╧═══════╧═══════╧═══════╧══════════╧════════╝
>
>
> hudi:tablename->show fsview all
>
> 171314 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> HoodieTableMetaClient from s3a://bucketName/path.parquet
>
> 171362 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
>
> 171666 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableConfig  -
> Loading table properties from
> s3a://bucketName/path.parquet/.hoodie/hoodie.properties
>
> 171725 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> s3a://bucketName/path.parquet
>
> 171725 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit
> timeline for s3a://bucketName/path.parquet
>
> 171817 [Spring Shell] INFO
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded
> instants [[20200724220817__clean__COMPLETED],
> [20200724220817__commit__COMPLETED]]
>
> 172262 [Spring Shell] INFO
> org.apache.hudi.common.table.view.AbstractTableFileSystemView  -
> addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5,
> StoreTimeTaken=2
>
>
> ╔═══════════╤════════╤══════════════╤═══════════╤════════════════╤═════════════════╤═══════════════════════╤═════════════╗
>
> ║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num
> Delta Files │ Total Delta File Size │ Delta Files ║
>
>
> ╠═══════════╧════════╧══════════════╧═══════════╧════════════════╧═════════════════╧═══════════════════════╧═════════════╣
>
> ║ (empty)
>                                               ║
>
>
> ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
>
> I looked through the CLI code, and it seems that for true support we would
> need to add support for the different storage options hdfs/s3/azure/etc. in
> HoodieTableMetaClient. As from my understanding TableNotFoundException.
> checkTableValidity one of the first steps in this function checks just the
> hdfs filesystem.
>
> Could someone please clarify if this is something already supported and I'm
> just not configuring it correctly or if it's something that would need to
> be added and if the HoodieTableMetaClient is on the right track or not?
>
> Thanks,
>

Reply via email to