[GitHub] [hudi] hudi-bot removed a comment on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4236:
URL: https://github.com/apache/hudi/pull/4236#issuecomment-990699643


   
   ## CI report:
   
   * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071)
 
   * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4236:
URL: https://github.com/apache/hudi/pull/4236#issuecomment-990705653


   
   ## CI report:
   
   * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071)
 
   * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4157)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4236:
URL: https://github.com/apache/hudi/pull/4236#issuecomment-990699643


   
   ## CI report:
   
   * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071)
 
   * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4236:
URL: https://github.com/apache/hudi/pull/4236#issuecomment-987617704


   
   ## CI report:
   
   * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4274:
URL: https://github.com/apache/hudi/pull/4274#issuecomment-990655541


   
   ## CI report:
   
   * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4274:
URL: https://github.com/apache/hudi/pull/4274#issuecomment-990683626


   
   ## CI report:
   
   * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] suribabu-un commented on issue #4151: [SUPPORT] ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat while running hive queries in EMR

2021-12-09 Thread GitBox


suribabu-un commented on issue #4151:
URL: https://github.com/apache/hudi/issues/4151#issuecomment-990681320


   Issue is unrelated to hudi, it has to do with the llap is running in the emr 
cluster. As mentioned above if llap is disabled then queries are running as 
expected.
   Issue can be resolved by creating a new llap bundle (using hive--service 
llap command) including all the hive.aux.jars.path using --auxjars params (may 
also need to include aws-java-sdk) and launch the server using new bundle after 
stoping the running server. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] suribabu-un closed issue #4151: [SUPPORT] ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat while running hive queries in EMR

2021-12-09 Thread GitBox


suribabu-un closed issue #4151:
URL: https://github.com/apache/hudi/issues/4151


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990661766


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990681037


   
   ## CI report:
   
   * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766402385



##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.

Review comment:
   i'll remove `Table with primary key` which is redundant with `notes` 
below. And move notes here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] JoshuaZhuCN opened a new issue #4275: [SUPPORT] How can I control the number of archive files

2021-12-09 Thread GitBox


JoshuaZhuCN opened a new issue #4275:
URL: https://github.com/apache/hudi/issues/4275


   
   When I use clustering async, I generate a lot of archive files, similar to 
commits. archive. xx_ 1-0-1
   
   how to control or clean up the number of these files
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.7
   
   * Hive version : ~
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990672722


   
   ## CI report:
   
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990651217


   
   ## CI report:
   
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766402385



##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.

Review comment:
   i'll remove `Table Types` and `Table with primary key` sections which 
are redundant with `Create Table Properties` below. And move `Create Table 
Properties` and notes here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766387860



##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.
+:::
+
+**CTAS**
+
+Hudi supports CTAS(Create Table As Select) on spark sql. 
+Note: For better performance to load data to hudi table, CTAS uses the **bulk 
insert** as the write operation.
+
+Example CTAS command to create a non-partitioned COW table without 
preCombineField.
+
+```sql
+-- CTAS: create a non-partitioned cow table without preCombineField
+create table hudi_ctas_cow_nonpcf_tbl
+using hudi
+tblproperties (primaryKey = 'id')
+as
+select 1 as id, 'a1' as name, 10 as price;
+```
+
+Example CTAS command to create a partitioned, primary key COW table.
+
+```sql
+-- CTAS: create a partitioned, preCombineField-provided cow table
+create table hudi_ctas_cow_pt_tbl
+using hudi
+tblproperties (type = 'cow', primaryKey = 'id', preCombineField = 'ts')
+partitioned by (dt)
+as
+select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-12-01' as dt;
+
+```
+
+Example CTAS command to load data from another table.
+
+```sql
+# create managed parquet table
+create table parquet_mngd using parquet location 
'file:///tmp/parquet_dataset/*.parquet';
+
+# CTAS by loading data into hudi table
+create table hudi_ctas_cow_pt_tbl2 using hudi location 
'file:/tmp/hudi/hudi_tbl/' options (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (datestr) as select * from parquet_mngd;
+```
+
+**Create Table Properties**
+
+Users can set table 

[hudi] branch asf-site updated (34e151d -> d003ae0)

2021-12-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 34e151d  [MINOR] Fix asf-site build error (#4273)
 add d003ae0  Travis CI build asf-site

No new revisions were added by this update.

Summary of changes:
 content/404.html   |  12 ++--
 content/404/index.html |  12 ++--
 content/assets/css/styles.32d50a6e.css |   1 +
 content/assets/css/styles.f788c9dd.css |  25 
 ...java_after-55881866a88c6c761b91623f020f919d.png | Bin
 ...ava_before-4380ebd14248afbd45938ccf55d96781.png | Bin
 .../IDE_setup_code_style_java_after.png| Bin
 .../IDE_setup_code_style_java_before.png   | Bin
 .../{0030fd86.618769a9.js => 0030fd86.387e8470.js} |   0
 .../{009f67ce.dc895153.js => 009f67ce.951d7d64.js} |   2 +-
 .../{02e54e09.e89eb23f.js => 02e54e09.af4b8497.js} |   0
 .../{02ff5d42.a8fe5023.js => 02ff5d42.d0dcdc4a.js} |   0
 .../{0480b142.cf906eb8.js => 0480b142.81014b37.js} |   0
 .../{04b49851.a41ad5b9.js => 04b49851.abc08d16.js} |   0
 .../{078339bb.bb8597fd.js => 078339bb.0c40da37.js} |   0
 .../{07deb48b.0a2ea837.js => 07deb48b.f609e50b.js} |   2 +-
 .../{0871002b.f48a7e3d.js => 0871002b.8d044703.js} |   0
 .../{09138901.1755be46.js => 09138901.3d66d075.js} |   0
 .../{09ff3d76.2835ca67.js => 09ff3d76.94712ee3.js} |   0
 .../{0a91021f.9db6d0c4.js => 0a91021f.91532e24.js} |   0
 .../{0b82d45d.bc3b7671.js => 0b82d45d.21d71e74.js} |   0
 .../{0c12eeea.26d92167.js => 0c12eeea.487f5f85.js} |   0
 .../{0c3d0366.9093a5b3.js => 0c3d0366.dde224ad.js} |   0
 .../{1007513a.56e6d13c.js => 1007513a.cc672f90.js} |   0
 .../{10ac9a3e.5fd72179.js => 10ac9a3e.8d0896cc.js} |   0
 content/assets/js/10b6d210.865ababb.js |   1 +
 content/assets/js/10b6d210.9ec23b9b.js |   1 -
 .../{12b957b7.bc758db5.js => 12b957b7.a8f1703d.js} |   0
 .../{149a2d9e.aaeb5871.js => 149a2d9e.31e38105.js} |   0
 .../{15ea2a5f.80750ba1.js => 15ea2a5f.88c93791.js} |   0
 content/assets/js/17896441.362ddb10.js |   1 +
 content/assets/js/17896441.a8c03b97.js |   1 -
 .../{19560f91.4c5f7133.js => 19560f91.51fadab6.js} |   0
 .../{1a20bc57.06f5bfb5.js => 1a20bc57.07c04f1b.js} |   0
 .../{1be78505.d2e6b112.js => 1be78505.47ce07ac.js} |   2 +-
 .../{1c3a958e.483821fa.js => 1c3a958e.96a08cd0.js} |   0
 .../{1db64337.0309d4b8.js => 1db64337.062e874a.js} |   0
 .../{1dba1ecf.fe58e182.js => 1dba1ecf.0187c054.js} |   0
 .../{1efbb938.e17be128.js => 1efbb938.ba9c353a.js} |   0
 content/assets/js/1f391b9e.3e4c536c.js |   1 -
 content/assets/js/1f391b9e.7bd79868.js |   1 +
 .../{1f8198a4.a01c3cfe.js => 1f8198a4.410dd3bb.js} |   0
 .../{1f97a7ff.4bd86959.js => 1f97a7ff.94f6a34f.js} |   0
 .../{20a6876f.1bc702ae.js => 20a6876f.c9a3a955.js} |   0
 .../{2153fb85.9c87dbe7.js => 2153fb85.809961c2.js} |   0
 .../{2263a65b.a891b40d.js => 2263a65b.78dc9fb7.js} |   0
 .../{23421dc8.c1f1f613.js => 23421dc8.d413b5dd.js} |   0
 .../{244c7b0a.b3d63e1b.js => 244c7b0a.bd4a4ba1.js} |   0
 .../{246d116d.64c3a7db.js => 246d116d.9ab3cee3.js} |   0
 .../{24f4e7d7.d7c2d76f.js => 24f4e7d7.16edd9c9.js} |   0
 .../{25aa47d2.42b070d2.js => 25aa47d2.b743e786.js} |   0
 .../{26115f23.eade5b49.js => 26115f23.8298cff5.js} |   0
 .../{261fe657.c0cd50b5.js => 261fe657.6cb7d5c3.js} |   0
 .../{2760fb69.85be465a.js => 2760fb69.46962a6d.js} |   0
 .../{2884dc3d.f6414a49.js => 2884dc3d.70aa2361.js} |   0
 .../{2947aa63.c0591c02.js => 2947aa63.0118dd72.js} |   0
 .../{29a0dcae.467bd8b9.js => 29a0dcae.a8f7cb8d.js} |   0
 .../{29db9f25.48087af2.js => 29db9f25.b91c3f3b.js} |   0
 .../{2a11e6a7.cd24f7a3.js => 2a11e6a7.95167bfc.js} |   0
 .../{2a5e97be.7e661803.js => 2a5e97be.f7d21b42.js} |   0
 .../{2a74f6a7.1e0d498c.js => 2a74f6a7.ded94ab5.js} |   0
 .../{2a7d5452.acceaca1.js => 2a7d5452.51c4f429.js} |   0
 .../{2aa42d18.b915ac19.js => 2aa42d18.868730cb.js} |   0
 .../{2b154460.9fffdbc1.js => 2b154460.e89a64a8.js} |   0
 .../{2b4cfa56.17268ab4.js => 2b4cfa56.62062312.js} |   0
 .../{2da5f59f.07b82329.js => 2da5f59f.b54ed0ce.js} |   0
 .../{2dada088.ee14934b.js => 2dada088.1bf958c0.js} |   0
 .../{2dcd9099.d9a4c00b.js => 2dcd9099.7d58768f.js} |   0
 .../{2df3fdca.e2b1589b.js => 2df3fdca.3988e307.js} |   0
 .../{2e72ea50.5e68f3da.js => 2e72ea50.112834ea.js} |   0
 .../{2e7e1134.a60cc0aa.js => 2e7e1134.fc54a73f.js} |   0
 .../{2fe15297.af8adbbe.js => 2fe15297.57295fe9.js} |   0
 .../{306a8c6c.d9a4a611.js => 306a8c6c.a37a4615.js} |   0
 .../{32eb34e5.51b43dcf.js => 32eb34e5.0cee5193.js} |   0
 .../{33ab05f6.784976e3.js => 33ab05f6.81636ed8.js} |   0
 .../{3415fffa.0a7bfd48.js => 3415fffa.5097447d.js} |   0
 .../{3523854b.9c36f3b7.js => 3523854b.c61a8e80.js} |   0
 .../{3533dbd1.2827e62b.js => 3533dbd1.ae176c3b.js} |   2 +-
 .../{35f2b245.51a60e98.js => 35f2b245.1b242bed.js} |   0
 .../{370287c4.39768faa.js => 

[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766383550



##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.

Review comment:
   ok, i'll update here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990660728


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990661766


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990660728


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990611154


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


xushiyan commented on a change in pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#discussion_r766367063



##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.

Review comment:
   ```suggestion
 Both of Hudi's table types (Copy-On-Write (COW) and Merge-On-Read (MOR)) 
can be created using Spark SQL.
   ```

##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.

Review comment:
   ```suggestion
   Spark SQL needs an explicit create table command.
   ```

##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.

Review comment:
   ```suggestion
 Users can create a partitioned table or a non-partitioned table in Spark 
SQL.
   ```

##
File path: website/docs/quick-start-guide.md
##
@@ -175,18 +175,163 @@ values={[
 
 
 
+Spark-sql needs an explicit create table command.
+
+- Table types:
+  Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be 
created using spark-sql.
+
+  While creating the table, table type can be specified using **type** option. 
**type = 'cow'** represents COW table, while **type = 'mor'** represents MOR 
table.
+
+- Partitioned & Non-Partitioned table:
+  Users can create a partitioned table or non-partitioned table in spark-sql.
+  To create a partitioned table, one needs to use **partitioned by** statement 
to specify the partition columns to create a partitioned table.
+  When there is no **partitioned by** statement with create table command, 
table is considered to be a non-partitioned table.
+
+- Managed & External table:
+  In general, spark-sql supports two kinds of tables, namely managed and 
external.
+  If one specifies a location using **location** statement or use `create 
external table` to create table explicitly, it is an external table, else its 
considered a managed table.
+  You can read more about external vs managed tables 
[here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/).
+
+- Table with primary key:
+  Users can choose to create a table with primary key as required. Else table 
is considered a non-primary keyed table.
+  One needs to set **primaryKey** column in options to create a primary key 
table.
+  If you are using any of the built-in key generators in Hudi, likely it is a 
primary key table.
+
+Let's go over some of the create table commands.
+
+**Create a Non-Partitioned Table**
+
 ```sql
--- 
-create table if not exists hudi_table2(
-  id int, 
-  name string, 
+-- create a cow table, with default primaryKey 'uuid' and without 
preCombineField provided
+create table hudi_cow_nonpcf_tbl (
+  uuid int,
+  name string,
   price double
+) using hudi;
+
+
+-- create a mor non-partitioned table without preCombineField provided
+create table hudi_mor_tbl (
+  id int,
+  name string,
+  price double,
+  ts bigint
 ) using hudi
-options (
-  type = 'cow'
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
 );
 ```
 
+Here is an example of creating an external COW partitioned table.
+
+**Create Partitioned Table**
+
+```sql
+-- create a partitioned, preCombineField-provided cow table
+create table hudi_cow_pt_tbl (
+  id bigint,
+  name string,
+  ts bigint,
+  dt string,
+  hh string
+) using hudi
+tblproperties (
+  type = 'cow',
+  primaryKey = 'id',
+  preCombineField = 'ts'
+ )
+partitioned by (dt, hh)
+location '/tmp/hudi/hudi_cow_pt_tbl';
+```
+
+**Create Table for an existing Hudi Table**
+
+We can create a table on an existing hudi table(created with spark-shell or 
deltastreamer). This is useful to
+read/write to/from a pre-existing hudi table.
+
+```sql
+-- create an external hudi table based on an existing path
+
+-- for non-partitioned table
+create table hudi_existing_tbl0 using hudi
+location 'file:///tmp/hudi/dataframe_hudi_nonpt_table';
+
+-- for partitioned table
+create table hudi_existing_tbl1 using hudi
+partitioned by (dt, hh)
+location 'file:///tmp/hudi/dataframe_hudi_pt_table';
+```
+
+:::tip
+You don't need to specify schema and any properties except the partitioned 
columns if existed. Hudi can automatically recognize the schema and 
configurations.

[hudi] branch master updated (ea154bc -> 456d74c)

2021-12-09 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from ea154bc  Revert "Claiming RFC for data skipping index for updated 
version (#4271)" (#4272)
 add 456d74c  [HUDI-2901] Fixed the bug clustering jobs cannot running in 
parallel (#4178)

No new revisions were added by this update.

Summary of changes:
 .../MultipleSparkJobExecutionStrategy.java | 11 +++---
 .../util/{Functions.java => FutureUtils.java}  | 45 +-
 2 files changed, 24 insertions(+), 32 deletions(-)
 copy hudi-common/src/main/java/org/apache/hudi/common/util/{Functions.java => 
FutureUtils.java} (50%)


[GitHub] [hudi] yihua merged pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


yihua merged pull request #4178:
URL: https://github.com/apache/hudi/pull/4178


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4274:
URL: https://github.com/apache/hudi/pull/4274#issuecomment-990655541


   
   ## CI report:
   
   * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4274:
URL: https://github.com/apache/hudi/pull/4274#issuecomment-990654497


   
   ## CI report:
   
   * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-09 Thread GitBox


Carl-Zhou-CN commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-990654733


   Because of your hudi version, you may need to manually update the partition 
after writing
   ALTER TABLE table_name RECOVER PARTITIONS;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4274:
URL: https://github.com/apache/hudi/pull/4274#issuecomment-990654497


   
   ## CI report:
   
   * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2974) Make the prefix for metrics name configurable

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2974:
-
Labels: pull-request-available  (was: )

> Make the prefix for metrics name configurable
> -
>
> Key: HUDI-2974
> URL: https://issues.apache.org/jira/browse/HUDI-2974
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available
>
> Currently metrics names always start with table name. This makes it less 
> flexible to create grafana dashboards with prometheus query. since its easier 
> to have consistent metrics names across all spark/deltastreamer jobs.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] rmahindra123 opened a new pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable

2021-12-09 Thread GitBox


rmahindra123 opened a new pull request #4274:
URL: https://github.com/apache/hudi/pull/4274


   Currently metrics names always start with table name. This makes it less 
flexible to create grafana dashboards with prometheus query. since its easier 
to have consistent metrics names across all spark/deltastreamer jobs.
   
   Adding a new config for the prefix name, but Keeping the default as the 
table name to ensure compatibility with current deployments.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2974) Make the prefix for metrics name configurable

2021-12-09 Thread Rajesh Mahindra (Jira)
Rajesh Mahindra created HUDI-2974:
-

 Summary: Make the prefix for metrics name configurable
 Key: HUDI-2974
 URL: https://issues.apache.org/jira/browse/HUDI-2974
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Rajesh Mahindra


Currently metrics names always start with table name. This makes it less 
flexible to create grafana dashboards with prometheus query. since its easier 
to have consistent metrics names across all spark/deltastreamer jobs.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] codope merged pull request #4273: [MINOR] Fix asf-site build error

2021-12-09 Thread GitBox


codope merged pull request #4273:
URL: https://github.com/apache/hudi/pull/4273


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [MINOR] Fix asf-site build error (#4273)

2021-12-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 34e151d  [MINOR] Fix asf-site build error (#4273)
34e151d is described below

commit 34e151d3198586544a3864e7e1e70d4be184108c
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Thu Dec 9 22:22:06 2021 -0800

[MINOR] Fix asf-site build error (#4273)
---
 website/package.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/website/package.json b/website/package.json
index 0d5f069..fbda5d8 100644
--- a/website/package.json
+++ b/website/package.json
@@ -14,7 +14,7 @@
 "write-heading-ids": "docusaurus write-heading-ids"
   },
   "dependencies": {
-"@docusaurus/core": "2.0.0-beta.3",
+"@docusaurus/core": "^2.0.0-beta.3",
 "@docusaurus/plugin-client-redirects": "^2.0.0-beta.3",
 "@docusaurus/plugin-sitemap": "^2.0.0-beta.3",
 "@docusaurus/preset-classic": "2.0.0-beta.3",


[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990556714


   
   ## CI report:
   
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990651217


   
   ## CI report:
   
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YuweiXiao commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


YuweiXiao commented on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990650172


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


xiarixiaoyao commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990647589


   @vinothchandar  @alexeykudinkin  @leesf  already update the code and address 
all comments. pls help me review again, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on pull request #3964: [HUDI-2732][RFC-38] Spark Datasource V2 Integration

2021-12-09 Thread GitBox


leesf commented on pull request #3964:
URL: https://github.com/apache/hudi/pull/3964#issuecomment-990645306


   > > And In the first phase, we would fallback to V1 write path
   > 
   > Can this be done? Love to see some code for this.
   
   yes, will open a PR in recent days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-09 Thread GitBox


Carl-Zhou-CN commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-990644240


   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.table": "my_hudi_table",
   "hoodie.datasource.hive_sync.partition_fields": "creation_date",
   "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   
   @Arun-kc If you do not register your Hudi dataset as a table in the Hive 
metastore, these options are not required.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (8321d20 -> ea154bc)

2021-12-09 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 8321d20  Claiming RFC for data skipping index for updated version 
(#4271)
 add ea154bc  Revert "Claiming RFC for data skipping index for updated 
version (#4271)" (#4272)

No new revisions were added by this update.

Summary of changes:
 rfc/README.md | 1 -
 1 file changed, 1 deletion(-)


[GitHub] [hudi] nsivabalan merged pull request #4272: [MINOR] Revert "Claiming RFC for data skipping index for updated version (#42…

2021-12-09 Thread GitBox


nsivabalan merged pull request #4272:
URL: https://github.com/apache/hudi/pull/4272


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #4272: [MINOR] Revert "Claiming RFC for data skipping index for updated version (#42…

2021-12-09 Thread GitBox


nsivabalan opened a new pull request #4272:
URL: https://github.com/apache/hudi/pull/4272


   …71)"
   
   This reverts commit 8321d20c2cced15150621c9ad828f5ba9d79399a.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990610366


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990626291


   
   ## CI report:
   
   * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-09 Thread GitBox


Arun-kc commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-990615939


   @Carl-Zhou-CN 
   The following is the hudi options I'm using as of now.
   ```python
   hudiOptions = {
   "hoodie.table.name": "my_hudi_table",
   "hoodie.datasource.write.recordkey.field": "id",
   "hoodie.datasource.write.partitionpath.field": "creation_date",
   "hoodie.datasource.write.precombine.field": "last_update_time",
   "hoodie.datasource.hive_sync.enable": "true",
   "hoodie.datasource.hive_sync.table": "my_hudi_table",
   "hoodie.datasource.hive_sync.partition_fields": "creation_date",
   "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
   "hoodie.index.type": "GLOBAL_BLOOM", # This is required 
if we want to ensure we upsert a record, even if the partition changes
   "hoodie.bloom.index.update.partition.path": "true",  # This is required 
to write the data into the new partition (defaults to false in 0.8.0, true in 
0.9.0)
   }
   ```
   
   As for `hoodie.datasource.hive_sync.jdbcurl`, I'm not using any hive as of 
now, so what URL should I mention? 
   
   I'm doing this in AWS Glue and using a hudi connector. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: Claiming RFC for data skipping index for updated version (#4271)

2021-12-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8321d20  Claiming RFC for data skipping index for updated version 
(#4271)
8321d20 is described below

commit 8321d20c2cced15150621c9ad828f5ba9d79399a
Author: Sivabalan Narayanan 
AuthorDate: Thu Dec 9 23:37:42 2021 -0500

Claiming RFC for data skipping index for updated version (#4271)
---
 rfc/README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rfc/README.md b/rfc/README.md
index 6c0b447..fe003d9 100644
--- a/rfc/README.md
+++ b/rfc/README.md
@@ -65,3 +65,4 @@ The list of all RFCs can be found here.
 | 39 | [Incremental source for Debezium](./rfc-39/rfc-39.md) | `IN PROGRESS` |
 | 40 | [Hudi Connector for Trino] | `UNDER REVIEW` |
 | 41 | [Hudi Snowflake Integration] | `UNDER REVIEW` |
+| 42 | [Updated version of Data skipping index] | `UNDER REVIEW` |
\ No newline at end of file


[GitHub] [hudi] codope merged pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)

2021-12-09 Thread GitBox


codope merged pull request #4271:
URL: https://github.com/apache/hudi/pull/4271


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990597314


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990611154


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990610366


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990609511


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   * 43c1e05bea47d18730eec37c24d94755d291c2f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990609511


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   * 43c1e05bea47d18730eec37c24d94755d291c2f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990597210


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4271:
URL: https://github.com/apache/hudi/pull/4271#issuecomment-990597335


   
   ## CI report:
   
   * b089271cd1db1ee41ed34018a9056450194cb900 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4271:
URL: https://github.com/apache/hudi/pull/4271#issuecomment-990598889


   
   ## CI report:
   
   * b089271cd1db1ee41ed34018a9056450194cb900 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4149)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4271:
URL: https://github.com/apache/hudi/pull/4271#issuecomment-990597335


   
   ## CI report:
   
   * b089271cd1db1ee41ed34018a9056450194cb900 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990595447


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990597314


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990569685


   
   ## CI report:
   
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990597210


   
   ## CI report:
   
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2973:
-
Labels: pull-request-available  (was: )

> Rewrite/re-publish RFC for Data skipping index
> --
>
> Key: HUDI-2973
> URL: https://issues.apache.org/jira/browse/HUDI-2973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-990595447


   
   ## CI report:
   
   * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)

2021-12-09 Thread GitBox


nsivabalan opened a new pull request #4271:
URL: https://github.com/apache/hudi/pull/4271


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index

2021-12-09 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2973:
-

 Summary: Rewrite/re-publish RFC for Data skipping index
 Key: HUDI-2973
 URL: https://issues.apache.org/jira/browse/HUDI-2973
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Docs
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index

2021-12-09 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2973:
-

Assignee: sivabalan narayanan

> Rewrite/re-publish RFC for Data skipping index
> --
>
> Key: HUDI-2973
> URL: https://issues.apache.org/jira/browse/HUDI-2973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index

2021-12-09 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2973:
--
Parent: HUDI-1822
Issue Type: Sub-task  (was: Improvement)

> Rewrite/re-publish RFC for Data skipping index
> --
>
> Key: HUDI-2973
> URL: https://issues.apache.org/jira/browse/HUDI-2973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: sivabalan narayanan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2811) Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2811:
-
Labels: pull-request-available sev:critical  (was: sev:critical)

> Support Spark 3.2 and Parquet 1.12.x
> 
>
> Key: HUDI-2811
> URL: https://issues.apache.org/jira/browse/HUDI-2811
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Blocker
>  Labels: pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> Reported issues
>  * [https://github.com/apache/hudi/issues/4001]
>  * [https://github.com/apache/hudi/issues/3841]
>  * [https://github.com/apache/hudi/issues/4202]
>  * [https://github.com/apache/hudi/issues/3834]
>  *  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] YannByron opened a new pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x

2021-12-09 Thread GitBox


YannByron opened a new pull request #4270:
URL: https://github.com/apache/hudi/pull/4270


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   support spark3.2 and paruqet 1.12.x
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on issue #4208: [SUPPORT] On Hudi 0.9.0 - Alter table throws java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, o

2021-12-09 Thread GitBox


YannByron commented on issue #4208:
URL: https://github.com/apache/hudi/issues/4208#issuecomment-990591132


   Hi, @BenjMaq
   i can't reproduce this issue. Can you check your environment? Based on the 
error above, i guess maybe the conflicts between jar cause this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-09 Thread GitBox


nsivabalan commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r766321890



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.common.config.ConfigClassProperty;
+import org.apache.hudi.common.config.ConfigGroups;
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * The consistency guard relevant config options.
+ */
+@ConfigClassProperty(name = "FileSystem Guard Configurations",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "The filesystem guard related config options, to help 
deal with runtime exception like s3 list/get/put/delete performance issues.")
+public class FileSystemGuardConfig  extends HoodieConfig {

Review comment:
   do you think naming this "FileSystemRetryConfig" would be more 
appropriate? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-09 Thread GitBox


nsivabalan commented on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-990589947


   sure, makes sense if there are other cloud stores that needs this retry. Can 
you please address the feedback given already. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-09 Thread GitBox


YannByron edited a comment on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751


   Hey, @BenjMaq
   I can't reproduce this issue using your sql in both hudi 0.9 and 0.10.
   I use spark-2.4.4 in 
[here](https://archive.apache.org/dist/spark/spark-2.4.4/) and hudi in 
[here](https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark-bundle_2.11/0.9.0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-09 Thread GitBox


YannByron edited a comment on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751


   Hey, @BenjMaq
   In both hudi 0.9 and 0.10, `insert overwrite` can work well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Rap70r commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-09 Thread GitBox


Rap70r commented on issue #4242:
URL: https://github.com/apache/hudi/issues/4242#issuecomment-990582235


   Got it, thank you. You mentioned that we can employ clustering to batch lot 
of small files together. Is there a specific configuration we need to set to 
achieve that? We are running Hudi in Spark using EMR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-09 Thread GitBox


YannByron edited a comment on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751


   Hey, @BenjMaq
   In both hudi 0.9 and 0.10, `insert overwrite` can work well. My spark 
version is 2.4.7, but i think it's ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-09 Thread GitBox


Carl-Zhou-CN commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-990576501


   @Arun-kc It feels like a connection problem, please check 
hoodie.datasource.hive_sync.jdbcurl, it seems to be a default value now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2903) get table schema from the last commit with data written

2021-12-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron reassigned HUDI-2903:


Assignee: Yann Byron

> get table schema from the last commit with data written
> ---
>
> Key: HUDI-2903
> URL: https://issues.apache.org/jira/browse/HUDI-2903
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> If the last operation is `delete_partition`, the 
> `{{{}HoodieCommitMetadata{}}}` object from the last commit will has an empty 
> `getFileIdAndRelativePaths`. And we can't get the table schema from it.
> So, i wanna find the last commit which data is written to.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990569685


   
   ## CI report:
   
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-09 Thread GitBox


nsivabalan commented on issue #4242:
URL: https://github.com/apache/hudi/issues/4242#issuecomment-990569759


   yes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990552126


   
   ## CI report:
   
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron commented on pull request #4269:
URL: https://github.com/apache/hudi/pull/4269#issuecomment-990569190


   @nsivabalan @xushiyan please help to review this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2878) Enhance hudi-quick start guide for spark-sql

2021-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2878:
-
Labels: pull-request-available  (was: )

> Enhance hudi-quick start guide for spark-sql
> 
>
> Key: HUDI-2878
> URL: https://issues.apache.org/jira/browse/HUDI-2878
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> We should try to streamline entire quick start guide using single flow/table 
> from start to end. As of now, every operations shows 3 to 4 options, but then 
> when we move to say update, it does not re-use the table from "insert" 
> section. 
>  
> If we look at scala quick start guide, we just use the same table from start 
> to end. And so, it gives a good end to end run book for users. Where as for 
> spark-sql, we don't have that now. For instance, if someone wants to try out 
> delete, they have to create a table by themselves and then go about deleting 
> based on delete examples given in our quick start guide. 
>  
> We need to go over diff ways to do an operation(for eg, create table w/ and 
> w/o primary keys, etc), but atleast for one table configuration, would be 
> good to have entire flow covered. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] zztttt commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6

2021-12-09 Thread GitBox


zz commented on issue #4072:
URL: https://github.com/apache/hudi/issues/4072#issuecomment-990568201


   > ```
   > Caused by: ERROR XSDB6: Another instance of Derby may have already booted 
the database /home/zzt/code/spark-debug/metastore_db.
   > ```
   > 
   > guess you already have another instance of derby running, i.e. another 
spark-shell running and trying to write to hudi or something.
   
   Thanks for your help! I still meet this problem when I want to store the 
metadata in Hivemetastore by the derby database approach. I address this 
problem by using the relational database approach of Hivemetastore, and it 
really works several days ago. 
   I can ensure that there is only one sparkSession instance running in the 
project, and before I start the project, I delete the metadata_db directory 
every time, but this doesn't work. It's confusing so I use another storage 
backend instead. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron opened a new pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql

2021-12-09 Thread GitBox


YannByron opened a new pull request #4269:
URL: https://github.com/apache/hudi/pull/4269


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL

2021-12-09 Thread GitBox


YannByron commented on issue #4154:
URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751


   Hey, @BenjMaq
   I test that it works in version 0.10. Can you use hudi 0.10?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990527370


   
   ## CI report:
   
   * a085e101422d1df36b94127e75e5d60716986e69 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4110)
 
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990556714


   
   ## CI report:
   
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990550517


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990552126


   
   ## CI report:
   
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990550517


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990534719


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


xiarixiaoyao commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r766290661



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##
@@ -246,6 +245,16 @@ public MultipleSparkJobExecutionStrategy(HoodieTable 
table, HoodieEngineContext
 }).map(this::transform);
   }
 
+  private static  CompletableFuture> allOf(@Nonnull 
List> futures) {

Review comment:
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


alexeykudinkin commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r766290137



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##
@@ -91,13 +92,11 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, 
HoodieEngineContext
 // execute clustering for each group async and collect WriteStatus
 JavaSparkContext engineContext = 
HoodieSparkEngineContext.getSparkContext(getEngineContext());
 // execute clustering for each group async and collect WriteStatus
-Stream> writeStatusRDDStream = 
clusteringPlan.getInputGroups().stream()
+Stream> writeStatusRDDStream = 
allOf(clusteringPlan.getInputGroups().stream()
 .map(inputGroup -> runClusteringForGroupAsync(inputGroup,
 clusteringPlan.getStrategy().getStrategyParams(),
 
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
-instantTime))
-.map(CompletableFuture::join);
-
+instantTime)).collect(Collectors.toList())).join().stream();

Review comment:
   Can you please re-format this snippet to stack up callers so that it's 
easy to attribute what method is invoked on each expression? 
   
   Like following: 
   
   ```
   allOf(
 clusteringPlan.getInputGroups().stream()
   .map(inputGroup -> runClusteringForGroupAsync(inputGroup,
   clusteringPlan.getStrategy().getStrategyParams(),
   
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
   instantTime)
   )
   .collect(Collectors.toList()))
   .join()
   .stream()
   ```
   
   It's very hard to understand what is going on there right now 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990534719


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990532933


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


alexeykudinkin commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r766289221



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##
@@ -246,6 +245,16 @@ public MultipleSparkJobExecutionStrategy(HoodieTable 
table, HoodieEngineContext
 }).map(this::transform);
   }
 
+  private static  CompletableFuture> allOf(@Nonnull 
List> futures) {

Review comment:
   Let's extract this to a common utility `FutureUtil` (into `hudi-common`) 
so it could be re-used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-990532933


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   * 2589cfb570762c4dca5968fae72f9b7948a69f31 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-09 Thread GitBox


hudi-bot removed a comment on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-984239555


   
   ## CI report:
   
   * c454677b96fab062cf31634426646d741ac9dbe5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-09 Thread GitBox


hudi-bot commented on pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#issuecomment-990527370


   
   ## CI report:
   
   * a085e101422d1df36b94127e75e5d60716986e69 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4110)
 
   * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >