[GitHub] [hudi] hudi-bot removed a comment on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests
hudi-bot removed a comment on pull request #4236: URL: https://github.com/apache/hudi/pull/4236#issuecomment-990699643 ## CI report: * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071) * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests
hudi-bot commented on pull request #4236: URL: https://github.com/apache/hudi/pull/4236#issuecomment-990705653 ## CI report: * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071) * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4157) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests
hudi-bot commented on pull request #4236: URL: https://github.com/apache/hudi/pull/4236#issuecomment-990699643 ## CI report: * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071) * 4871c93376740dfc1d53ed7942d4eb96d8c1f0b7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4236: [HUDI-2936] Add data count checks in async clustering tests
hudi-bot removed a comment on pull request #4236: URL: https://github.com/apache/hudi/pull/4236#issuecomment-987617704 ## CI report: * e4908379cb7faee6bdc554b0937b9a4557797eea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4071) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
hudi-bot removed a comment on pull request #4274: URL: https://github.com/apache/hudi/pull/4274#issuecomment-990655541 ## CI report: * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
hudi-bot commented on pull request #4274: URL: https://github.com/apache/hudi/pull/4274#issuecomment-990683626 ## CI report: * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] suribabu-un commented on issue #4151: [SUPPORT] ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat while running hive queries in EMR
suribabu-un commented on issue #4151: URL: https://github.com/apache/hudi/issues/4151#issuecomment-990681320 Issue is unrelated to hudi, it has to do with the llap is running in the emr cluster. As mentioned above if llap is disabled then queries are running as expected. Issue can be resolved by creating a new llap bundle (using hive--service llap command) including all the hive.aux.jars.path using --auxjars params (may also need to include aws-java-sdk) and launch the server using new bundle after stoping the running server. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] suribabu-un closed issue #4151: [SUPPORT] ClassNotFoundException: org.apache.hudi.hadoop.HoodieParquetInputFormat while running hive queries in EMR
suribabu-un closed issue #4151: URL: https://github.com/apache/hudi/issues/4151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot removed a comment on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990661766 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990681037 ## CI report: * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron commented on a change in pull request #4269: URL: https://github.com/apache/hudi/pull/4269#discussion_r766402385 ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. + To create a partitioned table, one needs to use **partitioned by** statement to specify the partition columns to create a partitioned table. + When there is no **partitioned by** statement with create table command, table is considered to be a non-partitioned table. + +- Managed & External table: + In general, spark-sql supports two kinds of tables, namely managed and external. + If one specifies a location using **location** statement or use `create external table` to create table explicitly, it is an external table, else its considered a managed table. + You can read more about external vs managed tables [here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/). + +- Table with primary key: + Users can choose to create a table with primary key as required. Else table is considered a non-primary keyed table. Review comment: i'll remove `Table with primary key` which is redundant with `notes` below. And move notes here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] JoshuaZhuCN opened a new issue #4275: [SUPPORT] How can I control the number of archive files
JoshuaZhuCN opened a new issue #4275: URL: https://github.com/apache/hudi/issues/4275 When I use clustering async, I generate a lot of archive files, similar to commits. archive. xx_ 1-0-1 how to control or clean up the number of these files **Environment Description** * Hudi version : 0.9.0 * Spark version : 2.4.7 * Hive version : ~ * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot commented on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990672722 ## CI report: * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot removed a comment on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990651217 ## CI report: * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron commented on a change in pull request #4269: URL: https://github.com/apache/hudi/pull/4269#discussion_r766402385 ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. + To create a partitioned table, one needs to use **partitioned by** statement to specify the partition columns to create a partitioned table. + When there is no **partitioned by** statement with create table command, table is considered to be a non-partitioned table. + +- Managed & External table: + In general, spark-sql supports two kinds of tables, namely managed and external. + If one specifies a location using **location** statement or use `create external table` to create table explicitly, it is an external table, else its considered a managed table. + You can read more about external vs managed tables [here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/). + +- Table with primary key: + Users can choose to create a table with primary key as required. Else table is considered a non-primary keyed table. Review comment: i'll remove `Table Types` and `Table with primary key` sections which are redundant with `Create Table Properties` below. And move `Create Table Properties` and notes here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron commented on a change in pull request #4269: URL: https://github.com/apache/hudi/pull/4269#discussion_r766387860 ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. + To create a partitioned table, one needs to use **partitioned by** statement to specify the partition columns to create a partitioned table. + When there is no **partitioned by** statement with create table command, table is considered to be a non-partitioned table. + +- Managed & External table: + In general, spark-sql supports two kinds of tables, namely managed and external. + If one specifies a location using **location** statement or use `create external table` to create table explicitly, it is an external table, else its considered a managed table. + You can read more about external vs managed tables [here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/). + +- Table with primary key: + Users can choose to create a table with primary key as required. Else table is considered a non-primary keyed table. + One needs to set **primaryKey** column in options to create a primary key table. + If you are using any of the built-in key generators in Hudi, likely it is a primary key table. + +Let's go over some of the create table commands. + +**Create a Non-Partitioned Table** + ```sql --- -create table if not exists hudi_table2( - id int, - name string, +-- create a cow table, with default primaryKey 'uuid' and without preCombineField provided +create table hudi_cow_nonpcf_tbl ( + uuid int, + name string, price double +) using hudi; + + +-- create a mor non-partitioned table without preCombineField provided +create table hudi_mor_tbl ( + id int, + name string, + price double, + ts bigint ) using hudi -options ( - type = 'cow' +tblproperties ( + type = 'cow', + primaryKey = 'id', + preCombineField = 'ts' ); ``` +Here is an example of creating an external COW partitioned table. + +**Create Partitioned Table** + +```sql +-- create a partitioned, preCombineField-provided cow table +create table hudi_cow_pt_tbl ( + id bigint, + name string, + ts bigint, + dt string, + hh string +) using hudi +tblproperties ( + type = 'cow', + primaryKey = 'id', + preCombineField = 'ts' + ) +partitioned by (dt, hh) +location '/tmp/hudi/hudi_cow_pt_tbl'; +``` + +**Create Table for an existing Hudi Table** + +We can create a table on an existing hudi table(created with spark-shell or deltastreamer). This is useful to +read/write to/from a pre-existing hudi table. + +```sql +-- create an external hudi table based on an existing path + +-- for non-partitioned table +create table hudi_existing_tbl0 using hudi +location 'file:///tmp/hudi/dataframe_hudi_nonpt_table'; + +-- for partitioned table +create table hudi_existing_tbl1 using hudi +partitioned by (dt, hh) +location 'file:///tmp/hudi/dataframe_hudi_pt_table'; +``` + +:::tip +You don't need to specify schema and any properties except the partitioned columns if existed. Hudi can automatically recognize the schema and configurations. +::: + +**CTAS** + +Hudi supports CTAS(Create Table As Select) on spark sql. +Note: For better performance to load data to hudi table, CTAS uses the **bulk insert** as the write operation. + +Example CTAS command to create a non-partitioned COW table without preCombineField. + +```sql +-- CTAS: create a non-partitioned cow table without preCombineField +create table hudi_ctas_cow_nonpcf_tbl +using hudi +tblproperties (primaryKey = 'id') +as +select 1 as id, 'a1' as name, 10 as price; +``` + +Example CTAS command to create a partitioned, primary key COW table. + +```sql +-- CTAS: create a partitioned, preCombineField-provided cow table +create table hudi_ctas_cow_pt_tbl +using hudi +tblproperties (type = 'cow', primaryKey = 'id', preCombineField = 'ts') +partitioned by (dt) +as +select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-12-01' as dt; + +``` + +Example CTAS command to load data from another table. + +```sql +# create managed parquet table +create table parquet_mngd using parquet location 'file:///tmp/parquet_dataset/*.parquet'; + +# CTAS by loading data into hudi table +create table hudi_ctas_cow_pt_tbl2 using hudi location 'file:/tmp/hudi/hudi_tbl/' options ( + type = 'cow', + primaryKey = 'id', + preCombineField = 'ts' + ) +partitioned by (datestr) as select * from parquet_mngd; +``` + +**Create Table Properties** + +Users can set table
[hudi] branch asf-site updated (34e151d -> d003ae0)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git. from 34e151d [MINOR] Fix asf-site build error (#4273) add d003ae0 Travis CI build asf-site No new revisions were added by this update. Summary of changes: content/404.html | 12 ++-- content/404/index.html | 12 ++-- content/assets/css/styles.32d50a6e.css | 1 + content/assets/css/styles.f788c9dd.css | 25 ...java_after-55881866a88c6c761b91623f020f919d.png | Bin ...ava_before-4380ebd14248afbd45938ccf55d96781.png | Bin .../IDE_setup_code_style_java_after.png| Bin .../IDE_setup_code_style_java_before.png | Bin .../{0030fd86.618769a9.js => 0030fd86.387e8470.js} | 0 .../{009f67ce.dc895153.js => 009f67ce.951d7d64.js} | 2 +- .../{02e54e09.e89eb23f.js => 02e54e09.af4b8497.js} | 0 .../{02ff5d42.a8fe5023.js => 02ff5d42.d0dcdc4a.js} | 0 .../{0480b142.cf906eb8.js => 0480b142.81014b37.js} | 0 .../{04b49851.a41ad5b9.js => 04b49851.abc08d16.js} | 0 .../{078339bb.bb8597fd.js => 078339bb.0c40da37.js} | 0 .../{07deb48b.0a2ea837.js => 07deb48b.f609e50b.js} | 2 +- .../{0871002b.f48a7e3d.js => 0871002b.8d044703.js} | 0 .../{09138901.1755be46.js => 09138901.3d66d075.js} | 0 .../{09ff3d76.2835ca67.js => 09ff3d76.94712ee3.js} | 0 .../{0a91021f.9db6d0c4.js => 0a91021f.91532e24.js} | 0 .../{0b82d45d.bc3b7671.js => 0b82d45d.21d71e74.js} | 0 .../{0c12eeea.26d92167.js => 0c12eeea.487f5f85.js} | 0 .../{0c3d0366.9093a5b3.js => 0c3d0366.dde224ad.js} | 0 .../{1007513a.56e6d13c.js => 1007513a.cc672f90.js} | 0 .../{10ac9a3e.5fd72179.js => 10ac9a3e.8d0896cc.js} | 0 content/assets/js/10b6d210.865ababb.js | 1 + content/assets/js/10b6d210.9ec23b9b.js | 1 - .../{12b957b7.bc758db5.js => 12b957b7.a8f1703d.js} | 0 .../{149a2d9e.aaeb5871.js => 149a2d9e.31e38105.js} | 0 .../{15ea2a5f.80750ba1.js => 15ea2a5f.88c93791.js} | 0 content/assets/js/17896441.362ddb10.js | 1 + content/assets/js/17896441.a8c03b97.js | 1 - .../{19560f91.4c5f7133.js => 19560f91.51fadab6.js} | 0 .../{1a20bc57.06f5bfb5.js => 1a20bc57.07c04f1b.js} | 0 .../{1be78505.d2e6b112.js => 1be78505.47ce07ac.js} | 2 +- .../{1c3a958e.483821fa.js => 1c3a958e.96a08cd0.js} | 0 .../{1db64337.0309d4b8.js => 1db64337.062e874a.js} | 0 .../{1dba1ecf.fe58e182.js => 1dba1ecf.0187c054.js} | 0 .../{1efbb938.e17be128.js => 1efbb938.ba9c353a.js} | 0 content/assets/js/1f391b9e.3e4c536c.js | 1 - content/assets/js/1f391b9e.7bd79868.js | 1 + .../{1f8198a4.a01c3cfe.js => 1f8198a4.410dd3bb.js} | 0 .../{1f97a7ff.4bd86959.js => 1f97a7ff.94f6a34f.js} | 0 .../{20a6876f.1bc702ae.js => 20a6876f.c9a3a955.js} | 0 .../{2153fb85.9c87dbe7.js => 2153fb85.809961c2.js} | 0 .../{2263a65b.a891b40d.js => 2263a65b.78dc9fb7.js} | 0 .../{23421dc8.c1f1f613.js => 23421dc8.d413b5dd.js} | 0 .../{244c7b0a.b3d63e1b.js => 244c7b0a.bd4a4ba1.js} | 0 .../{246d116d.64c3a7db.js => 246d116d.9ab3cee3.js} | 0 .../{24f4e7d7.d7c2d76f.js => 24f4e7d7.16edd9c9.js} | 0 .../{25aa47d2.42b070d2.js => 25aa47d2.b743e786.js} | 0 .../{26115f23.eade5b49.js => 26115f23.8298cff5.js} | 0 .../{261fe657.c0cd50b5.js => 261fe657.6cb7d5c3.js} | 0 .../{2760fb69.85be465a.js => 2760fb69.46962a6d.js} | 0 .../{2884dc3d.f6414a49.js => 2884dc3d.70aa2361.js} | 0 .../{2947aa63.c0591c02.js => 2947aa63.0118dd72.js} | 0 .../{29a0dcae.467bd8b9.js => 29a0dcae.a8f7cb8d.js} | 0 .../{29db9f25.48087af2.js => 29db9f25.b91c3f3b.js} | 0 .../{2a11e6a7.cd24f7a3.js => 2a11e6a7.95167bfc.js} | 0 .../{2a5e97be.7e661803.js => 2a5e97be.f7d21b42.js} | 0 .../{2a74f6a7.1e0d498c.js => 2a74f6a7.ded94ab5.js} | 0 .../{2a7d5452.acceaca1.js => 2a7d5452.51c4f429.js} | 0 .../{2aa42d18.b915ac19.js => 2aa42d18.868730cb.js} | 0 .../{2b154460.9fffdbc1.js => 2b154460.e89a64a8.js} | 0 .../{2b4cfa56.17268ab4.js => 2b4cfa56.62062312.js} | 0 .../{2da5f59f.07b82329.js => 2da5f59f.b54ed0ce.js} | 0 .../{2dada088.ee14934b.js => 2dada088.1bf958c0.js} | 0 .../{2dcd9099.d9a4c00b.js => 2dcd9099.7d58768f.js} | 0 .../{2df3fdca.e2b1589b.js => 2df3fdca.3988e307.js} | 0 .../{2e72ea50.5e68f3da.js => 2e72ea50.112834ea.js} | 0 .../{2e7e1134.a60cc0aa.js => 2e7e1134.fc54a73f.js} | 0 .../{2fe15297.af8adbbe.js => 2fe15297.57295fe9.js} | 0 .../{306a8c6c.d9a4a611.js => 306a8c6c.a37a4615.js} | 0 .../{32eb34e5.51b43dcf.js => 32eb34e5.0cee5193.js} | 0 .../{33ab05f6.784976e3.js => 33ab05f6.81636ed8.js} | 0 .../{3415fffa.0a7bfd48.js => 3415fffa.5097447d.js} | 0 .../{3523854b.9c36f3b7.js => 3523854b.c61a8e80.js} | 0 .../{3533dbd1.2827e62b.js => 3533dbd1.ae176c3b.js} | 2 +- .../{35f2b245.51a60e98.js => 35f2b245.1b242bed.js} | 0 .../{370287c4.39768faa.js =>
[GitHub] [hudi] YannByron commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron commented on a change in pull request #4269: URL: https://github.com/apache/hudi/pull/4269#discussion_r766383550 ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. + To create a partitioned table, one needs to use **partitioned by** statement to specify the partition columns to create a partitioned table. + When there is no **partitioned by** statement with create table command, table is considered to be a non-partitioned table. + +- Managed & External table: + In general, spark-sql supports two kinds of tables, namely managed and external. + If one specifies a location using **location** statement or use `create external table` to create table explicitly, it is an external table, else its considered a managed table. + You can read more about external vs managed tables [here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/). + +- Table with primary key: + Users can choose to create a table with primary key as required. Else table is considered a non-primary keyed table. Review comment: ok, i'll update here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot removed a comment on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990660728 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990661766 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4155) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990660728 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) * bcc62e5eeea6a2929e4144c00f2d0b29bcc786cd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot removed a comment on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990611154 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a change in pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
xushiyan commented on a change in pull request #4269: URL: https://github.com/apache/hudi/pull/4269#discussion_r766367063 ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. Review comment: ```suggestion Both of Hudi's table types (Copy-On-Write (COW) and Merge-On-Read (MOR)) can be created using Spark SQL. ``` ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. Review comment: ```suggestion Spark SQL needs an explicit create table command. ``` ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. Review comment: ```suggestion Users can create a partitioned table or a non-partitioned table in Spark SQL. ``` ## File path: website/docs/quick-start-guide.md ## @@ -175,18 +175,163 @@ values={[ +Spark-sql needs an explicit create table command. + +- Table types: + Both types of hudi tables (CopyOnWrite (COW) and MergeOnRead (MOR)) can be created using spark-sql. + + While creating the table, table type can be specified using **type** option. **type = 'cow'** represents COW table, while **type = 'mor'** represents MOR table. + +- Partitioned & Non-Partitioned table: + Users can create a partitioned table or non-partitioned table in spark-sql. + To create a partitioned table, one needs to use **partitioned by** statement to specify the partition columns to create a partitioned table. + When there is no **partitioned by** statement with create table command, table is considered to be a non-partitioned table. + +- Managed & External table: + In general, spark-sql supports two kinds of tables, namely managed and external. + If one specifies a location using **location** statement or use `create external table` to create table explicitly, it is an external table, else its considered a managed table. + You can read more about external vs managed tables [here](https://sparkbyexamples.com/apache-hive/difference-between-hive-internal-tables-and-external-tables/). + +- Table with primary key: + Users can choose to create a table with primary key as required. Else table is considered a non-primary keyed table. + One needs to set **primaryKey** column in options to create a primary key table. + If you are using any of the built-in key generators in Hudi, likely it is a primary key table. + +Let's go over some of the create table commands. + +**Create a Non-Partitioned Table** + ```sql --- -create table if not exists hudi_table2( - id int, - name string, +-- create a cow table, with default primaryKey 'uuid' and without preCombineField provided +create table hudi_cow_nonpcf_tbl ( + uuid int, + name string, price double +) using hudi; + + +-- create a mor non-partitioned table without preCombineField provided +create table hudi_mor_tbl ( + id int, + name string, + price double, + ts bigint ) using hudi -options ( - type = 'cow' +tblproperties ( + type = 'cow', + primaryKey = 'id', + preCombineField = 'ts' ); ``` +Here is an example of creating an external COW partitioned table. + +**Create Partitioned Table** + +```sql +-- create a partitioned, preCombineField-provided cow table +create table hudi_cow_pt_tbl ( + id bigint, + name string, + ts bigint, + dt string, + hh string +) using hudi +tblproperties ( + type = 'cow', + primaryKey = 'id', + preCombineField = 'ts' + ) +partitioned by (dt, hh) +location '/tmp/hudi/hudi_cow_pt_tbl'; +``` + +**Create Table for an existing Hudi Table** + +We can create a table on an existing hudi table(created with spark-shell or deltastreamer). This is useful to +read/write to/from a pre-existing hudi table. + +```sql +-- create an external hudi table based on an existing path + +-- for non-partitioned table +create table hudi_existing_tbl0 using hudi +location 'file:///tmp/hudi/dataframe_hudi_nonpt_table'; + +-- for partitioned table +create table hudi_existing_tbl1 using hudi +partitioned by (dt, hh) +location 'file:///tmp/hudi/dataframe_hudi_pt_table'; +``` + +:::tip +You don't need to specify schema and any properties except the partitioned columns if existed. Hudi can automatically recognize the schema and configurations.
[hudi] branch master updated (ea154bc -> 456d74c)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from ea154bc Revert "Claiming RFC for data skipping index for updated version (#4271)" (#4272) add 456d74c [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel (#4178) No new revisions were added by this update. Summary of changes: .../MultipleSparkJobExecutionStrategy.java | 11 +++--- .../util/{Functions.java => FutureUtils.java} | 45 +- 2 files changed, 24 insertions(+), 32 deletions(-) copy hudi-common/src/main/java/org/apache/hudi/common/util/{Functions.java => FutureUtils.java} (50%)
[GitHub] [hudi] yihua merged pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
yihua merged pull request #4178: URL: https://github.com/apache/hudi/pull/4178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
hudi-bot commented on pull request #4274: URL: https://github.com/apache/hudi/pull/4274#issuecomment-990655541 ## CI report: * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4154) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
hudi-bot removed a comment on pull request #4274: URL: https://github.com/apache/hudi/pull/4274#issuecomment-990654497 ## CI report: * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Carl-Zhou-CN commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-990654733 Because of your hudi version, you may need to manually update the partition after writing ALTER TABLE table_name RECOVER PARTITIONS; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
hudi-bot commented on pull request #4274: URL: https://github.com/apache/hudi/pull/4274#issuecomment-990654497 ## CI report: * 1e718c4bcfe432a4ac03f807c889c67ee8d962ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2974) Make the prefix for metrics name configurable
[ https://issues.apache.org/jira/browse/HUDI-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2974: - Labels: pull-request-available (was: ) > Make the prefix for metrics name configurable > - > > Key: HUDI-2974 > URL: https://issues.apache.org/jira/browse/HUDI-2974 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Rajesh Mahindra >Priority: Major > Labels: pull-request-available > > Currently metrics names always start with table name. This makes it less > flexible to create grafana dashboards with prometheus query. since its easier > to have consistent metrics names across all spark/deltastreamer jobs. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] rmahindra123 opened a new pull request #4274: [HUDI-2974] Make the prefix for metrics name configurable
rmahindra123 opened a new pull request #4274: URL: https://github.com/apache/hudi/pull/4274 Currently metrics names always start with table name. This makes it less flexible to create grafana dashboards with prometheus query. since its easier to have consistent metrics names across all spark/deltastreamer jobs. Adding a new config for the prefix name, but Keeping the default as the table name to ensure compatibility with current deployments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2974) Make the prefix for metrics name configurable
Rajesh Mahindra created HUDI-2974: - Summary: Make the prefix for metrics name configurable Key: HUDI-2974 URL: https://issues.apache.org/jira/browse/HUDI-2974 Project: Apache Hudi Issue Type: Improvement Reporter: Rajesh Mahindra Currently metrics names always start with table name. This makes it less flexible to create grafana dashboards with prometheus query. since its easier to have consistent metrics names across all spark/deltastreamer jobs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] codope merged pull request #4273: [MINOR] Fix asf-site build error
codope merged pull request #4273: URL: https://github.com/apache/hudi/pull/4273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [MINOR] Fix asf-site build error (#4273)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 34e151d [MINOR] Fix asf-site build error (#4273) 34e151d is described below commit 34e151d3198586544a3864e7e1e70d4be184108c Author: Raymond Xu <2701446+xushi...@users.noreply.github.com> AuthorDate: Thu Dec 9 22:22:06 2021 -0800 [MINOR] Fix asf-site build error (#4273) --- website/package.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/package.json b/website/package.json index 0d5f069..fbda5d8 100644 --- a/website/package.json +++ b/website/package.json @@ -14,7 +14,7 @@ "write-heading-ids": "docusaurus write-heading-ids" }, "dependencies": { -"@docusaurus/core": "2.0.0-beta.3", +"@docusaurus/core": "^2.0.0-beta.3", "@docusaurus/plugin-client-redirects": "^2.0.0-beta.3", "@docusaurus/plugin-sitemap": "^2.0.0-beta.3", "@docusaurus/preset-classic": "2.0.0-beta.3",
[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot removed a comment on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990556714 ## CI report: * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot commented on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990651217 ## CI report: * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4153) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
YuweiXiao commented on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990650172 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
xiarixiaoyao commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990647589 @vinothchandar @alexeykudinkin @leesf already update the code and address all comments. pls help me review again, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on pull request #3964: [HUDI-2732][RFC-38] Spark Datasource V2 Integration
leesf commented on pull request #3964: URL: https://github.com/apache/hudi/pull/3964#issuecomment-990645306 > > And In the first phase, we would fallback to V1 write path > > Can this be done? Love to see some code for this. yes, will open a PR in recent days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Carl-Zhou-CN commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-990644240 "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.table": "my_hudi_table", "hoodie.datasource.hive_sync.partition_fields": "creation_date", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", @Arun-kc If you do not register your Hudi dataset as a table in the Hive metastore, these options are not required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (8321d20 -> ea154bc)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 8321d20 Claiming RFC for data skipping index for updated version (#4271) add ea154bc Revert "Claiming RFC for data skipping index for updated version (#4271)" (#4272) No new revisions were added by this update. Summary of changes: rfc/README.md | 1 - 1 file changed, 1 deletion(-)
[GitHub] [hudi] nsivabalan merged pull request #4272: [MINOR] Revert "Claiming RFC for data skipping index for updated version (#42…
nsivabalan merged pull request #4272: URL: https://github.com/apache/hudi/pull/4272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #4272: [MINOR] Revert "Claiming RFC for data skipping index for updated version (#42…
nsivabalan opened a new pull request #4272: URL: https://github.com/apache/hudi/pull/4272 …71)" This reverts commit 8321d20c2cced15150621c9ad828f5ba9d79399a. ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990610366 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990626291 ## CI report: * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Arun-kc commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-990615939 @Carl-Zhou-CN The following is the hudi options I'm using as of now. ```python hudiOptions = { "hoodie.table.name": "my_hudi_table", "hoodie.datasource.write.recordkey.field": "id", "hoodie.datasource.write.partitionpath.field": "creation_date", "hoodie.datasource.write.precombine.field": "last_update_time", "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.table": "my_hudi_table", "hoodie.datasource.hive_sync.partition_fields": "creation_date", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", "hoodie.index.type": "GLOBAL_BLOOM", # This is required if we want to ensure we upsert a record, even if the partition changes "hoodie.bloom.index.update.partition.path": "true", # This is required to write the data into the new partition (defaults to false in 0.8.0, true in 0.9.0) } ``` As for `hoodie.datasource.hive_sync.jdbcurl`, I'm not using any hive as of now, so what URL should I mention? I'm doing this in AWS Glue and using a hudi connector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: Claiming RFC for data skipping index for updated version (#4271)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8321d20 Claiming RFC for data skipping index for updated version (#4271) 8321d20 is described below commit 8321d20c2cced15150621c9ad828f5ba9d79399a Author: Sivabalan Narayanan AuthorDate: Thu Dec 9 23:37:42 2021 -0500 Claiming RFC for data skipping index for updated version (#4271) --- rfc/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rfc/README.md b/rfc/README.md index 6c0b447..fe003d9 100644 --- a/rfc/README.md +++ b/rfc/README.md @@ -65,3 +65,4 @@ The list of all RFCs can be found here. | 39 | [Incremental source for Debezium](./rfc-39/rfc-39.md) | `IN PROGRESS` | | 40 | [Hudi Connector for Trino] | `UNDER REVIEW` | | 41 | [Hudi Snowflake Integration] | `UNDER REVIEW` | +| 42 | [Updated version of Data skipping index] | `UNDER REVIEW` | \ No newline at end of file
[GitHub] [hudi] codope merged pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)
codope merged pull request #4271: URL: https://github.com/apache/hudi/pull/4271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot removed a comment on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990597314 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990611154 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990610366 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) * 43c1e05bea47d18730eec37c24d94755d291c2f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4150) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990609511 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) * 43c1e05bea47d18730eec37c24d94755d291c2f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990609511 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) * 43c1e05bea47d18730eec37c24d94755d291c2f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990597210 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)
hudi-bot removed a comment on pull request #4271: URL: https://github.com/apache/hudi/pull/4271#issuecomment-990597335 ## CI report: * b089271cd1db1ee41ed34018a9056450194cb900 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)
hudi-bot commented on pull request #4271: URL: https://github.com/apache/hudi/pull/4271#issuecomment-990598889 ## CI report: * b089271cd1db1ee41ed34018a9056450194cb900 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4149) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)
hudi-bot commented on pull request #4271: URL: https://github.com/apache/hudi/pull/4271#issuecomment-990597335 ## CI report: * b089271cd1db1ee41ed34018a9056450194cb900 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot removed a comment on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990595447 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990597314 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4148) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990569685 ## CI report: * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990597210 ## CI report: * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index
[ https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2973: - Labels: pull-request-available (was: ) > Rewrite/re-publish RFC for Data skipping index > -- > > Key: HUDI-2973 > URL: https://issues.apache.org/jira/browse/HUDI-2973 > Project: Apache Hudi > Issue Type: Sub-task > Components: Docs >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
hudi-bot commented on pull request #4270: URL: https://github.com/apache/hudi/pull/4270#issuecomment-990595447 ## CI report: * 7095ede3d5fa162df3804d05c3a1ff009e6f4ef4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #4271: [HUDI-2973] Claiming RFC number for data skipping index (updated version)
nsivabalan opened a new pull request #4271: URL: https://github.com/apache/hudi/pull/4271 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index
sivabalan narayanan created HUDI-2973: - Summary: Rewrite/re-publish RFC for Data skipping index Key: HUDI-2973 URL: https://issues.apache.org/jira/browse/HUDI-2973 Project: Apache Hudi Issue Type: Improvement Components: Docs Reporter: sivabalan narayanan -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index
[ https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-2973: - Assignee: sivabalan narayanan > Rewrite/re-publish RFC for Data skipping index > -- > > Key: HUDI-2973 > URL: https://issues.apache.org/jira/browse/HUDI-2973 > Project: Apache Hudi > Issue Type: Sub-task > Components: Docs >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2973) Rewrite/re-publish RFC for Data skipping index
[ https://issues.apache.org/jira/browse/HUDI-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2973: -- Parent: HUDI-1822 Issue Type: Sub-task (was: Improvement) > Rewrite/re-publish RFC for Data skipping index > -- > > Key: HUDI-2973 > URL: https://issues.apache.org/jira/browse/HUDI-2973 > Project: Apache Hudi > Issue Type: Sub-task > Components: Docs >Reporter: sivabalan narayanan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2811) Support Spark 3.2 and Parquet 1.12.x
[ https://issues.apache.org/jira/browse/HUDI-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2811: - Labels: pull-request-available sev:critical (was: sev:critical) > Support Spark 3.2 and Parquet 1.12.x > > > Key: HUDI-2811 > URL: https://issues.apache.org/jira/browse/HUDI-2811 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: Raymond Xu >Assignee: Yann Byron >Priority: Blocker > Labels: pull-request-available, sev:critical > Fix For: 0.11.0 > > > Reported issues > * [https://github.com/apache/hudi/issues/4001] > * [https://github.com/apache/hudi/issues/3841] > * [https://github.com/apache/hudi/issues/4202] > * [https://github.com/apache/hudi/issues/3834] > * -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] YannByron opened a new pull request #4270: [HUDI-2811] Support Spark 3.2 and Parquet 1.12.x
YannByron opened a new pull request #4270: URL: https://github.com/apache/hudi/pull/4270 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request support spark3.2 and paruqet 1.12.x *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on issue #4208: [SUPPORT] On Hudi 0.9.0 - Alter table throws java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, o
YannByron commented on issue #4208: URL: https://github.com/apache/hudi/issues/4208#issuecomment-990591132 Hi, @BenjMaq i can't reproduce this issue. Can you check your environment? Based on the error above, i guess maybe the conflicts between jar cause this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
nsivabalan commented on a change in pull request #3887: URL: https://github.com/apache/hudi/pull/3887#discussion_r766321890 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.fs; + +import org.apache.hudi.common.config.ConfigClassProperty; +import org.apache.hudi.common.config.ConfigGroups; +import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; + +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.Properties; + +/** + * The consistency guard relevant config options. + */ +@ConfigClassProperty(name = "FileSystem Guard Configurations", +groupName = ConfigGroups.Names.WRITE_CLIENT, +description = "The filesystem guard related config options, to help deal with runtime exception like s3 list/get/put/delete performance issues.") +public class FileSystemGuardConfig extends HoodieConfig { Review comment: do you think naming this "FileSystemRetryConfig" would be more appropriate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.
nsivabalan commented on pull request #3887: URL: https://github.com/apache/hudi/pull/3887#issuecomment-990589947 sure, makes sense if there are other cloud stores that needs this retry. Can you please address the feedback given already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
YannByron edited a comment on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751 Hey, @BenjMaq I can't reproduce this issue using your sql in both hudi 0.9 and 0.10. I use spark-2.4.4 in [here](https://archive.apache.org/dist/spark/spark-2.4.4/) and hudi in [here](https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark-bundle_2.11/0.9.0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
YannByron edited a comment on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751 Hey, @BenjMaq In both hudi 0.9 and 0.10, `insert overwrite` can work well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Rap70r commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions
Rap70r commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-990582235 Got it, thank you. You mentioned that we can employ clustering to batch lot of small files together. Is there a specific configuration we need to set to achieve that? We are running Hudi in Spark using EMR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
YannByron edited a comment on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751 Hey, @BenjMaq In both hudi 0.9 and 0.10, `insert overwrite` can work well. My spark version is 2.4.7, but i think it's ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena
Carl-Zhou-CN commented on issue #4267: URL: https://github.com/apache/hudi/issues/4267#issuecomment-990576501 @Arun-kc It feels like a connection problem, please check hoodie.datasource.hive_sync.jdbcurl, it seems to be a default value now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2903) get table schema from the last commit with data written
[ https://issues.apache.org/jira/browse/HUDI-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yann Byron reassigned HUDI-2903: Assignee: Yann Byron > get table schema from the last commit with data written > --- > > Key: HUDI-2903 > URL: https://issues.apache.org/jira/browse/HUDI-2903 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > If the last operation is `delete_partition`, the > `{{{}HoodieCommitMetadata{}}}` object from the last commit will has an empty > `getFileIdAndRelativePaths`. And we can't get the table schema from it. > So, i wanna find the last commit which data is written to. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990569685 ## CI report: * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4147) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions
nsivabalan commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-990569759 yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990552126 ## CI report: * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron commented on pull request #4269: URL: https://github.com/apache/hudi/pull/4269#issuecomment-990569190 @nsivabalan @xushiyan please help to review this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2878) Enhance hudi-quick start guide for spark-sql
[ https://issues.apache.org/jira/browse/HUDI-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2878: - Labels: pull-request-available (was: ) > Enhance hudi-quick start guide for spark-sql > > > Key: HUDI-2878 > URL: https://issues.apache.org/jira/browse/HUDI-2878 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: sivabalan narayanan >Assignee: Yann Byron >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > We should try to streamline entire quick start guide using single flow/table > from start to end. As of now, every operations shows 3 to 4 options, but then > when we move to say update, it does not re-use the table from "insert" > section. > > If we look at scala quick start guide, we just use the same table from start > to end. And so, it gives a good end to end run book for users. Where as for > spark-sql, we don't have that now. For instance, if someone wants to try out > delete, they have to create a table by themselves and then go about deleting > based on delete examples given in our quick start guide. > > We need to go over diff ways to do an operation(for eg, create table w/ and > w/o primary keys, etc), but atleast for one table configuration, would be > good to have entire flow covered. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] zztttt commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6
zz commented on issue #4072: URL: https://github.com/apache/hudi/issues/4072#issuecomment-990568201 > ``` > Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/zzt/code/spark-debug/metastore_db. > ``` > > guess you already have another instance of derby running, i.e. another spark-shell running and trying to write to hudi or something. Thanks for your help! I still meet this problem when I want to store the metadata in Hivemetastore by the derby database approach. I address this problem by using the relational database approach of Hivemetastore, and it really works several days ago. I can ensure that there is only one sparkSession instance running in the project, and before I start the project, I delete the metadata_db directory every time, but this doesn't work. It's confusing so I use another storage backend instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron opened a new pull request #4269: [HUDI-2878] enhance hudi-quick-start guide for spark-sql
YannByron opened a new pull request #4269: URL: https://github.com/apache/hudi/pull/4269 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
YannByron commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-990561751 Hey, @BenjMaq I test that it works in version 0.10. Can you use hudi 0.10? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot removed a comment on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990527370 ## CI report: * a085e101422d1df36b94127e75e5d60716986e69 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4110) * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot commented on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990556714 ## CI report: * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990550517 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990552126 ## CI report: * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990550517 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) * c4cffc9908f2a8e79f4c24dc566942f2c6d8b752 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990534719 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
xiarixiaoyao commented on a change in pull request #4178: URL: https://github.com/apache/hudi/pull/4178#discussion_r766290661 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java ## @@ -246,6 +245,16 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, HoodieEngineContext }).map(this::transform); } + private static CompletableFuture> allOf(@Nonnull List> futures) { Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
alexeykudinkin commented on a change in pull request #4178: URL: https://github.com/apache/hudi/pull/4178#discussion_r766290137 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java ## @@ -91,13 +92,11 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, HoodieEngineContext // execute clustering for each group async and collect WriteStatus JavaSparkContext engineContext = HoodieSparkEngineContext.getSparkContext(getEngineContext()); // execute clustering for each group async and collect WriteStatus -Stream> writeStatusRDDStream = clusteringPlan.getInputGroups().stream() +Stream> writeStatusRDDStream = allOf(clusteringPlan.getInputGroups().stream() .map(inputGroup -> runClusteringForGroupAsync(inputGroup, clusteringPlan.getStrategy().getStrategyParams(), Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), -instantTime)) -.map(CompletableFuture::join); - +instantTime)).collect(Collectors.toList())).join().stream(); Review comment: Can you please re-format this snippet to stack up callers so that it's easy to attribute what method is invoked on each expression? Like following: ``` allOf( clusteringPlan.getInputGroups().stream() .map(inputGroup -> runClusteringForGroupAsync(inputGroup, clusteringPlan.getStrategy().getStrategyParams(), Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), instantTime) ) .collect(Collectors.toList())) .join() .stream() ``` It's very hard to understand what is going on there right now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990534719 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4146) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990532933 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
alexeykudinkin commented on a change in pull request #4178: URL: https://github.com/apache/hudi/pull/4178#discussion_r766289221 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java ## @@ -246,6 +245,16 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, HoodieEngineContext }).map(this::transform); } + private static CompletableFuture> allOf(@Nonnull List> futures) { Review comment: Let's extract this to a common utility `FutureUtil` (into `hudi-common`) so it could be re-used -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-990532933 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) * 2589cfb570762c4dca5968fae72f9b7948a69f31 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
hudi-bot removed a comment on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-984239555 ## CI report: * c454677b96fab062cf31634426646d741ac9dbe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3917) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3928) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4222: [HUDI-2849] improve SparkUI job description for write path
hudi-bot commented on pull request #4222: URL: https://github.com/apache/hudi/pull/4222#issuecomment-990527370 ## CI report: * a085e101422d1df36b94127e75e5d60716986e69 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4110) * dd0773b261cd2d6d503eaa3e02c93edddcb31093 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4145) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org