Hudi CLI AWS Glue & S3 Tables

2020-09-08 Thread Adam
Hey guys,
I'm trying to use the Hudi CLI to connect to tables stored on S3 using the
Glue metastore. Using a tip from Ashish M G

on Slack, I added the dependencies, re-built and was able to use the
connect command to connect to the table, albeit with warnings:

hudi->connect --path s3a://bucketName/path.parquet

29597 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
sun.security.krb5.Config.getInstance()

WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil

WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations

WARNING: All illegal access operations will be denied in a future release

29785 [Spring Shell] WARN  org.apache.hadoop.util.NativeCodeLoader  -
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable

31060 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]

31380 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties

31455 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet

Metadata for table tablename loaded

However, many of the other commands seem to not be working properly:

hudi:tablename->savepoints show

╔═══╗

║ SavepointTime ║

╠═══╣

║ (empty)   ║

╚═══╝

hudi:tablename->savepoint create

Commit null not found in Commits
org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
[20200724220817__commit__COMPLETED]


hudi:tablename->stats filesizes

╔╤═══╤═══╤═══╤═══╤═══╤═══╤══╤╗

║ CommitTime │ Min   │ 10th  │ 50th  │ avg   │ 95th  │ Max   │ NumFiles │
StdDev ║

╠╪═══╪═══╪═══╪═══╪═══╪═══╪══╪╣

║ ALL│ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0│
0.0 B  ║

╚╧═══╧═══╧═══╧═══╧═══╧═══╧══╧╝


hudi:tablename->show fsview all

171314 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet

171362 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]

171666 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig  -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties

171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet

171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit
timeline for s3a://bucketName/path.parquet

171817 [Spring Shell] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded
instants [[20200724220817__clean__COMPLETED],
[20200724220817__commit__COMPLETED]]

172262 [Spring Shell] INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView  -
addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5,
StoreTimeTaken=2

╔═══╤╤══╤═══╤╤═╤═══╤═╗

║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num
Delta Files │ Total Delta File Size │ Delta Files ║

╠═══╧╧══╧═══╧╧═╧═══╧═╣

║ (empty)
  ║

╚╝

I looked through the CLI code, and it seems that for true support we would
need to add support for the different storage options hdfs/s3/azure/etc. in
HoodieTableMetaClient. As from my unde

Re: Hudi CLI AWS Glue & S3 Tables

2020-09-09 Thread Pratyaksh Sharma
Hi Adam,

I have not used the CLI tool much, but s3 filesystem is already supported
in Hudi. You may check the following class to see the list of file systems
already supported -
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
.

On Wed, Sep 9, 2020 at 6:46 AM Adam  wrote:

> Hey guys,
> I'm trying to use the Hudi CLI to connect to tables stored on S3 using the
> Glue metastore. Using a tip from Ashish M G
> <
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ
> >
> on Slack, I added the dependencies, re-built and was able to use the
> connect command to connect to the table, albeit with warnings:
>
> hudi->connect --path s3a://bucketName/path.parquet
>
> 29597 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> HoodieTableMetaClient from s3a://bucketName/path.parquet
>
> WARNING: An illegal reflective access operation has occurred
>
> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
> sun.security.krb5.Config.getInstance()
>
> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
>
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
>
> WARNING: All illegal access operations will be denied in a future release
>
> 29785 [Spring Shell] WARN  org.apache.hadoop.util.NativeCodeLoader  -
> Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
>
> 31060 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
>
> 31380 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig
> -
> Loading table properties from
> s3a://bucketName/path.parquet/.hoodie/hoodie.properties
>
> 31455 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> s3a://bucketName/path.parquet
>
> Metadata for table tablename loaded
>
> However, many of the other commands seem to not be working properly:
>
> hudi:tablename->savepoints show
>
> ╔═══╗
>
> ║ SavepointTime ║
>
> ╠═══╣
>
> ║ (empty)   ║
>
> ╚═══╝
>
> hudi:tablename->savepoint create
>
> Commit null not found in Commits
> org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
> [20200724220817__commit__COMPLETED]
>
>
> hudi:tablename->stats filesizes
>
>
> ╔╤═══╤═══╤═══╤═══╤═══╤═══╤══╤╗
>
> ║ CommitTime │ Min   │ 10th  │ 50th  │ avg   │ 95th  │ Max   │ NumFiles │
> StdDev ║
>
>
> ╠╪═══╪═══╪═══╪═══╪═══╪═══╪══╪╣
>
> ║ ALL│ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0│
> 0.0 B  ║
>
>
> ╚╧═══╧═══╧═══╧═══╧═══╧═══╧══╧╝
>
>
> hudi:tablename->show fsview all
>
> 171314 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> HoodieTableMetaClient from s3a://bucketName/path.parquet
>
> 171362 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
>
> 171666 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableConfig  -
> Loading table properties from
> s3a://bucketName/path.parquet/.hoodie/hoodie.properties
>
> 171725 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> s3a://bucketName/path.parquet
>
> 171725 [Spring Shell] INFO
> org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit
> timeline for s3a://bucketName/path.parquet
>
> 171817 [Spring Shell] INFO
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded
> instants [[20200724220817__clean__COMPLETED],
> [20200724220817__commit__COMPLETED]]
>
> 172262 [Spring Shell] INFO
> org.apache.hudi.common.table.view.AbstractTableFileSystemView  -
> addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5,
> StoreTimeTaken=2
>
>
> ╔═══╤╤══╤═══╤╤═╤═══╤═╗
>
> ║ Partition │ FileId │ Base-Instant │ Data-File │ Data-

Re: Hudi CLI AWS Glue & S3 Tables

2020-09-09 Thread Vinoth Chandar
it's possible that some of the commands are not erroring gracefully for
missing parameters?

hudi:tablename->savepoint create

for eg, would need a commit time for creating the savepoint,

if you are able to connect to the dataset, then it should all be working,

On Wed, Sep 9, 2020 at 3:27 AM Pratyaksh Sharma 
wrote:

> Hi Adam,
>
> I have not used the CLI tool much, but s3 filesystem is already supported
> in Hudi. You may check the following class to see the list of file systems
> already supported -
>
> https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java
> .
>
> On Wed, Sep 9, 2020 at 6:46 AM Adam  wrote:
>
> > Hey guys,
> > I'm trying to use the Hudi CLI to connect to tables stored on S3 using
> the
> > Glue metastore. Using a tip from Ashish M G
> > <
> >
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ
> > >
> > on Slack, I added the dependencies, re-built and was able to use the
> > connect command to connect to the table, albeit with warnings:
> >
> > hudi->connect --path s3a://bucketName/path.parquet
> >
> > 29597 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> > HoodieTableMetaClient from s3a://bucketName/path.parquet
> >
> > WARNING: An illegal reflective access operation has occurred
> >
> > WARNING: Illegal reflective access by
> > org.apache.hadoop.security.authentication.util.KerberosUtil
> > (file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
> > sun.security.krb5.Config.getInstance()
> >
> > WARNING: Please consider reporting this to the maintainers of
> > org.apache.hadoop.security.authentication.util.KerberosUtil
> >
> > WARNING: Use --illegal-access=warn to enable warnings of further illegal
> > reflective access operations
> >
> > WARNING: All illegal access operations will be denied in a future release
> >
> > 29785 [Spring Shell] WARN  org.apache.hadoop.util.NativeCodeLoader  -
> > Unable to load native-hadoop library for your platform... using
> > builtin-java classes where applicable
> >
> > 31060 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> > Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> > FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
> >
> > 31380 [Spring Shell] INFO  org.apache.hudi.common.table.HoodieTableConfig
> > -
> > Loading table properties from
> > s3a://bucketName/path.parquet/.hoodie/hoodie.properties
> >
> > 31455 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> > Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> > s3a://bucketName/path.parquet
> >
> > Metadata for table tablename loaded
> >
> > However, many of the other commands seem to not be working properly:
> >
> > hudi:tablename->savepoints show
> >
> > ╔═══╗
> >
> > ║ SavepointTime ║
> >
> > ╠═══╣
> >
> > ║ (empty)   ║
> >
> > ╚═══╝
> >
> > hudi:tablename->savepoint create
> >
> > Commit null not found in Commits
> > org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
> > [20200724220817__commit__COMPLETED]
> >
> >
> > hudi:tablename->stats filesizes
> >
> >
> >
> ╔╤═══╤═══╤═══╤═══╤═══╤═══╤══╤╗
> >
> > ║ CommitTime │ Min   │ 10th  │ 50th  │ avg   │ 95th  │ Max   │ NumFiles │
> > StdDev ║
> >
> >
> >
> ╠╪═══╪═══╪═══╪═══╪═══╪═══╪══╪╣
> >
> > ║ ALL│ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0│
> > 0.0 B  ║
> >
> >
> >
> ╚╧═══╧═══╧═══╧═══╧═══╧═══╧══╧╝
> >
> >
> > hudi:tablename->show fsview all
> >
> > 171314 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableMetaClient  - Loading
> > HoodieTableMetaClient from s3a://bucketName/path.parquet
> >
> > 171362 [Spring Shell] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop
> > Configuration: fs.defaultFS: [file:///], Config:[Configuration:
> > core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> > yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
> > FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
> >
> > 171666 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableConfig  -
> > Loading table properties from
> > s3a://bucketName/path.parquet/.hoodie/hoodie.properties
> >
> > 171725 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading
> > Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
> > s3a://bucketName/path.parquet
> >
> > 171725 [Spring Shell] INFO
> > org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active
> commit
> > timeline for s3a://bucketN