[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992199532


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227)
 
   * 6184de8bc6d18499d0ff49a0ef8f92f8ba20ba6e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992195331


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] waywtdcc commented on issue #4249: [SUPPORT]FLINK CDC WRITE HUDI, restart job get exception:org.apache.hudi.org.apache.avro.InvalidAvroMagicException: Not an Avro data file

2021-12-12 Thread GitBox


waywtdcc commented on issue #4249:
URL: https://github.com/apache/hudi/issues/4249#issuecomment-992195627


   > Not an Avro data file
   
   Is the release version 0.10 okay? I see there is also an exception of the 
0.10 rc version here #4204 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992195331


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992165761


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767475656



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -166,20 +165,28 @@ protected void syncHoodieTable(String tableName, boolean 
useRealtimeInputFormat,
 // Check if the necessary table exists
 boolean tableExists = hoodieHiveClient.doesTableExist(tableName);
 
-// Get the parquet schema for this table looking at the latest commit
-MessageType schema = hoodieHiveClient.getDataSchema();
-
-// Currently HoodieBootstrapRelation does support reading bootstrap MOR rt 
table,
-// so we disable the syncAsSparkDataSourceTable here to avoid read such 
kind table
-// by the data source way (which will use the HoodieBootstrapRelation).
-// TODO after we support bootstrap MOR rt table in 
HoodieBootstrapRelation[HUDI-2071], we can remove this logical.
-if (hoodieHiveClient.isBootstrap()
-&& hoodieHiveClient.getTableType() == HoodieTableType.MERGE_ON_READ
-&& !readAsOptimized) {
-  cfg.syncAsSparkDataSourceTable = false;
+// check if isDeletePartition
+boolean isDeletePartition = hoodieHiveClient.isDeletePartition();

Review comment:
   rename to `isDropPartition` and change to isDropPartition method name




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767475329



##
File path: 
hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/HoodieDLAClient.java
##
@@ -287,6 +287,11 @@ public void updatePartitionsToTable(String tableName, 
List changedPartit
 }
   }
 
+  @Override
+  public void dropPartitionsToTable(String tableName, List 
partitionsToDelete) {
+throw new UnsupportedOperationException("Not support 
dropPartitionsToTables yet.");

Review comment:
   `dropPartitionsToTable`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992158754


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992183402


   
   ## CI report:
   
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-12 Thread GitBox


Carl-Zhou-CN commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-992178782


   I think it is possible, but I am not familiar with Athena. I think that as 
long as Hudi can interact with Glue Catalog, your problem should be solved. You 
may need to ask others to help.
   @nsivabalan Do you have time to help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992158732


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992165710


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4228)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992165761


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4227)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992099257


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


xiarixiaoyao commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992164844


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zztttt edited a comment on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6

2021-12-12 Thread GitBox


zz edited a comment on issue #4072:
URL: https://github.com/apache/hudi/issues/4072#issuecomment-992151992


   > hmmm, seems strange. have you tried giving a diff warehouse dir?
   
   yes, I have already tried to change the warehouse dir URL, but it didn't 
work. Using remote metastore may be a better approach, and I have finished this 
problem by adding "spark.hadoop." prefix before hive configurations what is 
usually placed in hive-site.xml. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] mincwang commented on issue #4227: [SUPPORT] java.lang.IllegalStateException: Duplicate key Option

2021-12-12 Thread GitBox


mincwang commented on issue #4227:
URL: https://github.com/apache/hudi/issues/4227#issuecomment-992159825


   hi @yanenze , i also has this probloms,do you have push patch to github 
repository ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992132673


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992081197


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992158754


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4226)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992158732


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   * c556f448e5db4e40fbd5b0a3ab81f3cfa8c30914 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-12 Thread GitBox


Arun-kc commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-992158121


   @Carl-Zhou-CN It's ok. 
   
   I have tried `ALTER TABLE ADD PARTITION` before, it does work. But we will 
have to specify the partitions manually. When there are a lot of partitions 
this is not a viable solution, until and unless we can automate it. I will have 
to create a script to do this using boto3, that's doable.
   
   What I was trying to do is letting the Hudi system do this on its own so 
that in Athena we can query the partitions directly without running any other 
queries. Is it possible? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1602:
--
Labels: core-flow-ds pull-request-available sev:critical  (was: 
pull-request-available sev:critical)

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> 

[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1850:
--
Labels: core-flow-ds pull-request-available release-blocker sev:critical 
spark  (was: pull-request-available release-blocker sev:critical spark)

> Read on table fails if the first write to table failed
> --
>
> Key: HUDI-1850
> URL: https://issues.apache.org/jira/browse/HUDI-1850
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Vaibhav Sinha
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, release-blocker, 
> sev:critical, spark
> Fix For: 0.11.0
>
> Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png
>
>
> {code:java}
> ava.util.NoSuchElementException: No value present in Option
>   at org.apache.hudi.common.util.Option.get(Option.java:88) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at scala.Option.getOrElse(Option.scala:189) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
> {code}
> The screenshot shows the files that got created before the write had failed.
>  
> !Screenshot 2021-04-24 at 7.53.22 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-945) Cleanup spillable map files eagerly as part of close

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-945:
-
Labels: pull-request-available sev:high  (was: pull-request-available 
sev:critical)

> Cleanup spillable map files eagerly as part of close
> 
>
> Key: HUDI-945
> URL: https://issues.apache.org/jira/browse/HUDI-945
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Rajesh Mahindra
>Priority: Major
>  Labels: pull-request-available, sev:high
> Fix For: 0.11.0
>
>
> Currently, files used by external spillable map are deleted on exits. For 
> spark-streaming/deltastreamer continuous-mode cases which runs several 
> iterations, it is better to eagerly delete files on closing the handles using 
> it. 
> We need to eagerly delete the files on following cases:
>  # HoodieMergeHandle
>  # HoodieMergedLogRecordScanner
>  # SpillableMapBasedFileSystemView



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-1607) Decimal handling bug in SparkAvroPostProcessor

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1607:
--
Labels: core-flow-ds sev:critical user-support-issues  (was: sev:critical 
user-support-issues)

> Decimal handling bug in SparkAvroPostProcessor 
> ---
>
> Key: HUDI-1607
> URL: https://issues.apache.org/jira/browse/HUDI-1607
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jingwei Zhang
>Priority: Major
>  Labels: core-flow-ds, sev:critical, user-support-issues
>
> This issue related to 
> [#[Hudi-1343]|[https://github.com/apache/hudi/pull/2192].]
> I think the purpose of Hudi-1343 was to bridge the difference between avro 
> 1.8.2(used by hudi) and avro 1.9.2(used by upstream system) thru internal 
> Struct type. In particular, the incompatible form to express nullable type 
> between those two versions. 
> It was all good until I hit the type Decimal. Since it can either be FIXED or 
> BYTES, if an avro schema contains decimal type with BYTES as its literal 
> type, after this two way conversion its literal type become FIXED instead. 
> This will cause an exception to be thrown in AvroConversionHelper as the data 
> underneath is HeapByteBuffer rather than GenericFixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] zztttt commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6

2021-12-12 Thread GitBox


zz commented on issue #4072:
URL: https://github.com/apache/hudi/issues/4072#issuecomment-992151992


   > hmmm, seems strange. have you tried giving a diff warehouse dir?
   
   yes, I have already tried to change the warehouse dir url, but it didn't 
works. Using remote metastore may be a better approach, and I have finished 
this but by add "spark.hadoop." prefix before hive configurations what is 
usually placed in hive-site.xml. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2374) AvroDFSSource does not use the overridden schema to deserialize Avro binaries.

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2374:
--
Labels: core-flow-ds sev:critical  (was: sev:critical)

> AvroDFSSource does not use the overridden schema to deserialize Avro binaries.
> --
>
> Key: HUDI-2374
> URL: https://issues.apache.org/jira/browse/HUDI-2374
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.9.0
>Reporter: Xuan Huy Pham
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: core-flow-ds, sev:critical
> Fix For: 0.11.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
> I am not sure if the AvroDFSSource is intended to ignore the source schema 
> from designated schema provider class, but the current logic always uses the 
> Avro writer schema as reader schema.
>  Logic as of release-0.9.0, Class: 
> {{org.apache.hudi.utilities.sources.AvroDFSSource}}
> {code:java}
> public class AvroDFSSource extends AvroSource {
>   private final DFSPathSelector pathSelector;
>   public AvroDFSSource(TypedProperties props, JavaSparkContext sparkContext, 
> SparkSession sparkSession,
>   SchemaProvider schemaProvider) throws IOException {
> super(props, sparkContext, sparkSession, schemaProvider);
> this.pathSelector = DFSPathSelector
> .createSourceSelector(props, sparkContext.hadoopConfiguration());
>   }
>   @Override
>   protected InputBatch> fetchNewData(Option 
> lastCkptStr, long sourceLimit) {
> Pair, String> selectPathsWithMaxModificationTime =
> pathSelector.getNextFilePathsAndMaxModificationTime(sparkContext, 
> lastCkptStr, sourceLimit);
> return selectPathsWithMaxModificationTime.getLeft()
> .map(pathStr -> new InputBatch<>(Option.of(fromFiles(pathStr)), 
> selectPathsWithMaxModificationTime.getRight()))
> .orElseGet(() -> new InputBatch<>(Option.empty(), 
> selectPathsWithMaxModificationTime.getRight()));
>   }
>   private JavaRDD fromFiles(String pathStr) {
> sparkContext.setJobGroup(this.getClass().getSimpleName(), "Fetch Avro 
> data from files");
> JavaPairRDD avroRDD = 
> sparkContext.newAPIHadoopFile(pathStr, AvroKeyInputFormat.class,
> AvroKey.class, NullWritable.class, 
> sparkContext.hadoopConfiguration());
> return avroRDD.keys().map(r -> ((GenericRecord) r.datum()));
>   }
> }
> {code}
> The {{schemaProvider}} parameter is completely ignored in the constructor, 
> making {{AvroKeyInputFormat}} always use writer schema to read.
> As a result, we often see this from DeltaStream logs:
> {code:java}
> 21/08/30 10:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
> AvroJob.setInputKeySchema() if desired.
> 21/08/30 10:17:24 INFO AvroKeyInputFormat: Using a reader schema equal to the 
> writer schema.
> {code}
> This [https://hudi.apache.org/blog/2021/08/16/kafka-custom-deserializer] is a 
> nice blog writing for AvroKafkaSource that supports BACKWARD_TRANSITIVE 
> schema evolution. For DFS data, I see this is the main blocker. If we pass 
> the source schema from {{schemaProvider}}, we should be able to have the same 
>  BACKWARD_TRANSITIVE schema evolution feature for DFS avro data.
>  
> Suggested Fix: Pass the source schema from {{schemaProvider}} to hadoop 
> configuration key {{avro.schema.input.key}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2986) Deltastreamer continuous mode run into Too many open files exception

2021-12-12 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2986:
--
Labels: core-flow-ds sev:critical  (was: sev:critical)

> Deltastreamer continuous mode run into Too many open files exception
> 
>
> Key: HUDI-2986
> URL: https://issues.apache.org/jira/browse/HUDI-2986
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Writer Core
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: core-flow-ds, sev:critical
>
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 6 in stage 35202.0 failed 4 times, most recent failure: Lost task 6.3 in 
> stage 35202.0 (TID 1172485, ip-10-211-53-165.infra.usw2.zdsys.com, executor 
> 1): java.io.FileNotFoundException: 
> /mnt/yarn/usercache/hadoop/appcache/application_1638666447607_0001/blockmgr-3725bb05-2c9a-4073-80f6-4eaa335321c9/34/temp_shuffle_8f675a83-21ac-4908-b8da-1c8e25a59b8e
>  (Too many open files)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:106)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:119)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:251)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:157)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:95)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2136)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2124)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2123)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2123)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:994)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:994)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2384)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2333)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2322)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:805)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2097)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2194)
>   at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1143)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
>   at org.apache.spark.rdd.RDD.fold(RDD.scala:1137)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply$mcD$sp(DoubleRDDFunctions.scala:35)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
>   at 
> org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$sum$1.apply(DoubleRDDFunctions.scala:35)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   

[GitHub] [hudi] yihua commented on issue #4230: [SUPPORT] org.apache.hudi.exception.HoodieRemoteException: Failed to create marker file

2021-12-12 Thread GitBox


yihua commented on issue #4230:
URL: https://github.com/apache/hudi/issues/4230#issuecomment-992150600


   @BenjMaq Could you try adding this config to disable timeline-server-based 
markers and check if the insert is successful?
   ```
   set hoodie.write.markers.type=direct;
   ```
   The problem is likely due to no/failed timeline server in the insert 
operation in Spark SQL.  I'm going to understand the root cause.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Carl-Zhou-CN commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-12 Thread GitBox


Carl-Zhou-CN commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-992150191


   @Arun-kc  Sorry, it seems I misunderstood,what needs to be done should be   
ALTER TABLE ADD PARTITION
   
![image](https://user-images.githubusercontent.com/67902676/145762846-007866d1-1bfe-46fe-b082-66e723007b92.png)
   
![image](https://user-images.githubusercontent.com/67902676/145762903-7599f309-b5b0-4eac-9e7c-d3b9ecafe9b8.png)
   
   
https://docs.aws.amazon.com/athena/latest/ug/querying-hudi.html#querying-hudi-in-athena-creating-hudi-tables


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2996) Flink streaming reader 'skip_compaction' option does not work

2021-12-12 Thread Danny Chen (Jira)
Danny Chen created HUDI-2996:


 Summary: Flink streaming reader 'skip_compaction' option does not 
work
 Key: HUDI-2996
 URL: https://issues.apache.org/jira/browse/HUDI-2996
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Affects Versions: 0.10.0
Reporter: Danny Chen
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] laurieliyang commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

2021-12-12 Thread GitBox


laurieliyang commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-992145998


   > @laurieliyang Thanks for fixing the Chinese docs. Could you fix the 
conflicts with the latest asf-site?
   
   I have fixed the conflicts in `overview.md`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767437932



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java
##
@@ -141,6 +142,11 @@ private String getHiveJdbcUrlWithDefaultDBName(String 
jdbcUrl) {
 }
   }
 
+  @Override
+  public void dropPartitionsToTable(String tableName, List 
partitionsToDelete) {
+throw new UnsupportedOperationException("No support for 
dropPartitionsToTable");

Review comment:
   why not support in jdbc mode?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767437932



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java
##
@@ -141,6 +142,11 @@ private String getHiveJdbcUrlWithDefaultDBName(String 
jdbcUrl) {
 }
   }
 
+  @Override
+  public void dropPartitionsToTable(String tableName, List 
partitionsToDelete) {
+throw new UnsupportedOperationException("No support for 
dropPartitionsToTable");

Review comment:
   not support in jdbc mode?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767437393



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -122,6 +122,14 @@ public void updatePartitionsToTable(String tableName, 
List changedPartit
 ddlExecutor.updatePartitionsToTable(tableName, changedPartitions);
   }
 
+  /**
+   * Partition path has changed - drop the following partitions.
+   */
+  @Override
+  public void dropPartitionsToTable(String tableName, List 
partitionsToDelete) {

Review comment:
   partitionsToDrop




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767437625



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -147,6 +155,14 @@ public void updateTableProperties(String tableName, 
Map tablePro
* Generate a list of PartitionEvent based on the changes required.
*/
   List getPartitionEvents(List tablePartitions, 
List partitionStoragePartitions) {
+return getPartitionEvents(tablePartitions, partitionStoragePartitions, 
false);
+  }
+
+  /**
+   * Iterate over the storage partitions and find if there are any new 
partitions that need to be added or updated.
+   * Generate a list of PartitionEvent based on the changes required.
+   */
+  List getPartitionEvents(List tablePartitions, 
List partitionStoragePartitions, boolean isDeletePartition) {

Review comment:
   ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-992142842


   
   ## CI report:
   
   * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-992123974


   
   ## CI report:
   
   * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141)
 
   * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767436889



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -331,19 +338,32 @@ private boolean syncSchema(String tableName, boolean 
tableExists, boolean useRea
* Syncs the list of storage partitions passed in (checks if the partition 
is in hive, if not adds it or if the
* partition path does not match, it updates the partition path).
*/
-  private boolean syncPartitions(String tableName, List 
writtenPartitionsSince) {
+  private boolean syncPartitions(String tableName, List 
writtenPartitionsSince, boolean isDeletePartition) {

Review comment:
   rename to `isDropPartition` to align with PartitionEvenType.DROP?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767436334



##
File path: 
hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/HoodieDLAClient.java
##
@@ -287,6 +287,11 @@ public void updatePartitionsToTable(String tableName, 
List changedPartit
 }
   }
 
+  @Override
+  public void dropPartitionsToTable(String tableName, List 
partitionsToDelete) {
+throw new UnsupportedOperationException("No support for 
dropPartitionsToTable");

Review comment:
   Not support dropPartitionsToTables yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #4291: [HUDI-2990] Sync to HMS when deleting partitions

2021-12-12 Thread GitBox


leesf commented on a change in pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#discussion_r767435793



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##
@@ -414,6 +414,25 @@ public Schema getLatestSchema(Schema writeSchema, boolean 
convertTableSchemaToAd
 return latestSchema;
   }
 
+
+  /**
+   * Get Last commit's Metadata.
+   */
+  public HoodieCommitMetadata getLatestCommitMetadata() {

Review comment:
   use `Option`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2990) Sync to HMS when deleting partitions

2021-12-12 Thread Forward Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Forward Xu updated HUDI-2990:
-
Summary: Sync to HMS when deleting partitions  (was: Delete partitions 
without metadata sync to hms)

> Sync to HMS when deleting partitions
> 
>
> Key: HUDI-2990
> URL: https://issues.apache.org/jira/browse/HUDI-2990
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992131759


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992132673


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2994) Add judgement to existed partitionPath in the catch code block for HUDI-2743

2021-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2994:
-
Labels: pull-request-available  (was: )

> Add judgement to existed partitionPath in the catch code block for HUDI-2743
> 
>
> Key: HUDI-2994
> URL: https://issues.apache.org/jira/browse/HUDI-2994
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.10.0
>Reporter: WangMinChao
>Assignee: WangMinChao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-12-13-13-25-33-402.png
>
>
> !image-2021-12-13-13-25-33-402.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992130078


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4294: [HUDI-2994] Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992131759


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   * bcc67932c21d90d73c2f85bc4bc08a35411ae6f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Arun-kc commented on issue #4267: [SUPPORT] Hudi partition values not getting reflected in Athena

2021-12-12 Thread GitBox


Arun-kc commented on issue #4267:
URL: https://github.com/apache/hudi/issues/4267#issuecomment-992131104


   @Carl-Zhou-CN 
   I tried `ALTER TABLE table_name RECOVER PARTITIONS;`, but its not working.
   
   
![image](https://user-images.githubusercontent.com/22231409/145756912-c82b44ee-8d03-4802-8412-ddf2919aa766.png)
   
   hoodie.datasource.hive_sync.use_jdbc -> false  Tried this approach too, but 
to no vain.
   
   @nikita-sheremet-clearscale 
   Yes, I'm using Glue in this scenario. I'm using a hudi connector that was 
subscribed when the version was 0.4. Now in marketplace the version is shown as 
0.9.0. I'm not sure if the subscribed version gets updated automatically. 
   
   I will check on the IP part and will let you know.
   
   Just to let you know, the hudi table I'm creating it manually in Athena 
using the following DDL
   ```sql
   CREATE EXTERNAL TABLE `my_hudi_table`(
 `_hoodie_commit_time` string, 
 `_hoodie_commit_seqno` string, 
 `_hoodie_record_key` string, 
 `_hoodie_partition_path` string, 
 `_hoodie_file_name` string, 
 `id` string, 
 `last_update_time` string)
   PARTITIONED BY ( 
 `creation_date` string)
   ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   STORED AS INPUTFORMAT 
 'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
   OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
 's3:///tmp/myhudidataset_001'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-12 Thread GitBox


yihua commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r767428311



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.

Review comment:
   nit: do you want to give a concrete example here to illustrate the 
operation of `combineAndGetUpdateValue()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2815) Support partial update for streaming change logs

2021-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2815:
-
Labels: pull-request-available  (was: )

> Support partial update for streaming change logs
> 
>
> Key: HUDI-2815
> URL: https://issues.apache.org/jira/browse/HUDI-2815
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>
> See issue: https://github.com/apache/hudi/issues/4030



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-12 Thread GitBox


yihua commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r767427510



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.
+ */
+public class PartialUpdateWithLatestAvroPayload extends 
DefaultHoodieRecordPayload {
+
+  public PartialUpdateWithLatestAvroPayload(GenericRecord record, Comparable 
orderingVal) {
+super(record, orderingVal);
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties) throws IOException {
+if (recordBytes.length == 0) {
+  return Option.of(currentValue);
+}
+
+GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
+
+// Null check is needed here to support schema evolution. The record in 
storage may be from old schema where
+// the new ordering column might not be present and hence returns null.
+if (!needUpdatingPersistedRecord(currentValue, incomingRecord, 
properties)) {
+  return Option.of(currentValue);
+}
+
+if (isDeleteRecord(incomingRecord)) {
+  return Option.empty();
+}
+
+GenericRecord currentRecord = (GenericRecord) currentValue;
+// The field num in updated record may be less than old record, so only 
update these partial fields to old record.
+List fields = schema.getFields();
+fields.forEach(field -> {
+  Object value = incomingRecord.get(field.name());
+  if (Objects.nonNull(value)) {
+currentRecord.put(field.name(), value);
+  }

Review comment:
   The difference compared to DefaultHoodieRecordPayload is that only if 
the corresponding field has a value, it overrides the field value in the 
existing record, instead of overriding it to null.  Is it correct?  The docs 
above are a bit confusing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992129272


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992130078


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4225)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2962) Support JVM based local process lock provider implementation

2021-12-12 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2962:
-
Summary: Support JVM based local process lock provider implementation  
(was: Enable metadata table along with JVM local lock provider)

> Support JVM based local process lock provider implementation
> 
>
> Key: HUDI-2962
> URL: https://issues.apache.org/jira/browse/HUDI-2962
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Metadata table is disabled by default in master due to 
> https://issues.apache.org/jira/browse/HUDI-2961. 
>  
> For the single writer + async table services deployment model, to protect 
> against races, we can have a fairly light weight JVM local lock provider. 
> This mean all the writes and the table services have to be running from the 
> single JVM, like in the case of DeltaStreamer.  This doesn't cover the multi 
> JVM writes, async table services though and a full fix for the same will be 
> covered by HUDI-2961. For now to have the metadata table re-enabled at 
> master, a JVM local locl provider should be sufficient. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4294:
URL: https://github.com/apache/hudi/pull/4294#issuecomment-992129272


   
   ## CI report:
   
   * 8c68cfecef8fc2892da4d332d2c2993a0460cdac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2995) Enable metadata table by default

2021-12-12 Thread Manoj Govindassamy (Jira)
Manoj Govindassamy created HUDI-2995:


 Summary: Enable metadata table by default
 Key: HUDI-2995
 URL: https://issues.apache.org/jira/browse/HUDI-2995
 Project: Apache Hudi
  Issue Type: Task
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
 Fix For: 0.11.0


Metadata table was disabled by default due to 
https://issues.apache.org/jira/browse/HUDI-2961

 

The interim workaround is to have JVM based local process lock provider -

https://issues.apache.org/jira/browse/HUDI-2962. With this we can turn on the 
metadata table by default.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] mincwang opened a new pull request #4294: Add judgement to existed partitionPath in the catch code block for HU…

2021-12-12 Thread GitBox


mincwang opened a new pull request #4294:
URL: https://github.com/apache/hudi/pull/4294


   …DI-2743
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2994) Add judgement to existed partitionPath in the catch code block for HUDI-2743

2021-12-12 Thread WangMinChao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangMinChao updated HUDI-2994:
--
Summary: Add judgement to existed partitionPath in the catch code block for 
HUDI-2743  (was: Add judge existed partitionPath in the catch code block for 
HUDI-2743)

> Add judgement to existed partitionPath in the catch code block for HUDI-2743
> 
>
> Key: HUDI-2994
> URL: https://issues.apache.org/jira/browse/HUDI-2994
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.10.0
>Reporter: WangMinChao
>Assignee: WangMinChao
>Priority: Major
> Attachments: image-2021-12-13-13-25-33-402.png
>
>
> !image-2021-12-13-13-25-33-402.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2994) Add judge existed partitionPath in the catch code block for HUDI-2743

2021-12-12 Thread WangMinChao (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangMinChao updated HUDI-2994:
--
Summary: Add judge existed partitionPath in the catch code block for 
HUDI-2743  (was: Add existed partitionPath process in the catch code block for 
HUDI-2743)

> Add judge existed partitionPath in the catch code block for HUDI-2743
> -
>
> Key: HUDI-2994
> URL: https://issues.apache.org/jira/browse/HUDI-2994
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.10.0
>Reporter: WangMinChao
>Assignee: WangMinChao
>Priority: Major
> Attachments: image-2021-12-13-13-25-33-402.png
>
>
> !image-2021-12-13-13-25-33-402.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-992123974


   
   ## CI report:
   
   * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141)
 
   * a4e9d227602017b1b1db0d2ef706afad0ea09158 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4224)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-992122965


   
   ## CI report:
   
   * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141)
 
   * a4e9d227602017b1b1db0d2ef706afad0ea09158 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2994) Add existed partitionPath process in the catch code block for HUDI-2743

2021-12-12 Thread WangMinChao (Jira)
WangMinChao created HUDI-2994:
-

 Summary: Add existed partitionPath process in the catch code block 
for HUDI-2743
 Key: HUDI-2994
 URL: https://issues.apache.org/jira/browse/HUDI-2994
 Project: Apache Hudi
  Issue Type: Bug
  Components: Common Core
Affects Versions: 0.10.0
Reporter: WangMinChao
Assignee: WangMinChao
 Attachments: image-2021-12-13-13-25-33-402.png

!image-2021-12-13-13-25-33-402.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua commented on pull request #3776: [HUDI-2543]: Added guides section

2021-12-12 Thread GitBox


yihua commented on pull request #3776:
URL: https://github.com/apache/hudi/pull/3776#issuecomment-992122994


   @pratyakshsharma any update on the nit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-992122965


   
   ## CI report:
   
   * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141)
 
   * a4e9d227602017b1b1db0d2ef706afad0ea09158 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#issuecomment-990409450


   
   ## CI report:
   
   * c9d8d403526f3f562283a2f64c4f4f7bddfee07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4141)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


manojpec commented on a change in pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#discussion_r767420453



##
File path: 
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/transaction/TestLocalProcessLockProvider.java
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.transaction;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.client.transaction.lock.LocalProcessLockProvider;
+import org.apache.hudi.common.config.LockConfiguration;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.exception.HoodieLockException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+public class TestLocalProcessLockProvider {
+
+  private static final Logger LOG = 
LogManager.getLogger(TestLocalProcessLockProvider.class);
+  private final Configuration hadoopConfiguration = new Configuration();
+  private final LockConfiguration lockConfiguration = new 
LockConfiguration(new TypedProperties());
+
+  @Test
+  public void testLockAcquisition() {
+LocalProcessLockProvider localProcessLockProvider = new 
LocalProcessLockProvider(lockConfiguration, hadoopConfiguration);
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.lock();
+});
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.unlock();
+});
+  }
+
+  @Test
+  public void testLockReAcquisitionBySameThread() {
+LocalProcessLockProvider localProcessLockProvider = new 
LocalProcessLockProvider(lockConfiguration, hadoopConfiguration);
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.lock();
+});
+assertThrows(HoodieLockException.class, () -> {
+  localProcessLockProvider.lock();
+});
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.unlock();
+});
+  }
+
+  @Test
+  public void testLockReAcquisitionByDifferentThread() {
+LocalProcessLockProvider localProcessLockProvider = new 
LocalProcessLockProvider(lockConfiguration, hadoopConfiguration);
+
+// Main test thread
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.lock();
+});
+
+// Another writer thread
+Thread writer2 = new Thread(new Runnable() {
+  @Override
+  public void run() {
+assertThrows(HoodieLockException.class, () -> {
+  localProcessLockProvider.lock();
+});
+  }
+});
+
+try {
+  writer2.join();
+} catch (InterruptedException e) {
+  //
+}
+
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.unlock();
+});
+  }
+
+  @Test
+  public void testTryLockAcquisition() {
+LocalProcessLockProvider localProcessLockProvider = new 
LocalProcessLockProvider(lockConfiguration, hadoopConfiguration);
+Assertions.assertTrue(localProcessLockProvider.tryLock());
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.unlock();
+});
+  }
+
+  @Test
+  public void testTryLockAcquisitionWithTimeout() {
+LocalProcessLockProvider localProcessLockProvider = new 
LocalProcessLockProvider(lockConfiguration, hadoopConfiguration);
+Assertions.assertTrue(localProcessLockProvider.tryLock(1, 
TimeUnit.MILLISECONDS));
+assertDoesNotThrow(() -> {
+  localProcessLockProvider.unlock();
+});
+  }
+
+  @Test
+  public void testTryLockReAcquisitionBySameThread() {

Review comment:
   Added a new unit test for your suggested case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


manojpec commented on a change in pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#discussion_r767420255



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LocalProcessLockProvider.java
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.transaction.lock;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.common.config.LockConfiguration;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.lock.LockProvider;
+import org.apache.hudi.common.lock.LockState;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieLockException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.jetbrains.annotations.NotNull;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+/**
+ * Local process level lock. This {@link LockProvider} implementation is to
+ * guard table from concurrent operations happening in the local JVM process.
+ * 
+ * Note: This Lock provider implementation doesn't allow lock reentrancy.
+ * Attempting to reacquire the lock from the same thread will throw
+ * HoodieLockException. Threads other than the current lock owner, will
+ * block on lock() and return false on tryLock().
+ */
+public class LocalProcessLockProvider implements 
LockProvider {
+
+  private static final Logger LOG = 
LogManager.getLogger(ZookeeperBasedLockProvider.class);
+  private static final ReentrantReadWriteLock LOCK = new 
ReentrantReadWriteLock();
+  private final long maxWaitTimeMillis;
+
+  public LocalProcessLockProvider(final LockConfiguration lockConfiguration, 
final Configuration conf) {
+TypedProperties typedProperties = lockConfiguration.getConfig();
+maxWaitTimeMillis = 
(typedProperties.containsKey(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
+? 
lockConfiguration.getConfig().getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
 : 0);
+  }
+
+  @Override
+  public void lock() {

Review comment:
   For the lock provider completeness would like to have the lock() 
implemented as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on a change in pull request #4259: [HUDI-2962] Local process lock provider to guard single writer process with async table operations

2021-12-12 Thread GitBox


manojpec commented on a change in pull request #4259:
URL: https://github.com/apache/hudi/pull/4259#discussion_r767420030



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LocalProcessLockProvider.java
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.transaction.lock;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.common.config.LockConfiguration;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.lock.LockProvider;
+import org.apache.hudi.common.lock.LockState;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.exception.HoodieLockException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.jetbrains.annotations.NotNull;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+/**
+ * Local process level lock. This {@link LockProvider} implementation is to
+ * guard table from concurrent operations happening in the local JVM process.
+ * 
+ * Note: This Lock provider implementation doesn't allow lock reentrancy.
+ * Attempting to reacquire the lock from the same thread will throw
+ * HoodieLockException. Threads other than the current lock owner, will
+ * block on lock() and return false on tryLock().
+ */
+public class LocalProcessLockProvider implements 
LockProvider {
+
+  private static final Logger LOG = 
LogManager.getLogger(ZookeeperBasedLockProvider.class);
+  private static final ReentrantReadWriteLock LOCK = new 
ReentrantReadWriteLock();
+  private final long maxWaitTimeMillis;
+
+  public LocalProcessLockProvider(final LockConfiguration lockConfiguration, 
final Configuration conf) {
+TypedProperties typedProperties = lockConfiguration.getConfig();
+maxWaitTimeMillis = 
(typedProperties.containsKey(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
+? 
lockConfiguration.getConfig().getLong(LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY)
 : 0);
+  }
+
+  @Override
+  public void lock() {
+LOG.info(getLogMessage(LockState.ACQUIRING));
+if (LOCK.isWriteLockedByCurrentThread()) {
+  throw new HoodieLockException(getLogMessage(LockState.ALREADY_ACQUIRED));
+}
+LOCK.writeLock().lock();
+LOG.info(getLogMessage(LockState.ACQUIRED));
+  }
+
+  @Override
+  public boolean tryLock() {
+LOG.info(getLogMessage(LockState.ACQUIRING));
+if (LOCK.writeLock().isHeldByCurrentThread()) {
+  throw new HoodieLockException(getLogMessage(LockState.ALREADY_ACQUIRED));
+}
+final boolean isLockAcquired;
+try {
+  isLockAcquired = LOCK.writeLock().tryLock(maxWaitTimeMillis, 
TimeUnit.MILLISECONDS);
+} catch (InterruptedException e) {
+  throw new 
HoodieLockException(getLogMessage(LockState.FAILED_TO_ACQUIRE));
+}
+LOG.info(getLogMessage(isLockAcquired ? LockState.ACQUIRED : 
LockState.FAILED_TO_ACQUIRE));
+return isLockAcquired;
+  }
+
+  @Override
+  public boolean tryLock(long time, @NotNull TimeUnit unit) {
+LOG.info(getLogMessage(LockState.ACQUIRING));

Review comment:
   right, fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

2021-12-12 Thread GitBox


yihua commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-992121886


   @leesf is there any plan to update the CN docs for 0.9.0, 0.10.0 releases, 
and the current version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

2021-12-12 Thread GitBox


yihua commented on a change in pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#discussion_r767224687



##
File path: 
website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md
##
@@ -1,26 +1,26 @@
 ---
-title: IBM Cloud Object Storage Filesystem
+title: IBM Cloud Object Storage 文件系统
 keywords: [ hudi, hive, ibm, cos, spark, presto]
-summary: In this page, we go over how to configure Hudi with IBM Cloud Object 
Storage filesystem.
+summary: 在本页中,我们讨论在 IBM Cloud Object Storage 文件系统中配置 Hudi 。
 last_modified_at: 2020-10-01T11:38:24-10:00
 language: cn
 ---
-In this page, we explain how to get your Hudi spark job to store into IBM 
Cloud Object Storage.
+在本页中,我们解释如何如何将你的 Hudi Spark 作业存储到 IBM Cloud Object Storage 当中。

Review comment:
   `我们解释如何如何...` -> `我们解释如何...`

##
File path: website/docusaurus.config.js
##
@@ -383,8 +383,20 @@ module.exports = {
 docs: {
   sidebarPath: require.resolve('./sidebars.js'),
   // Please change this to your repo.
-  editUrl:
-'https://github.com/apache/hudi/edit/asf-site/website/docs/',
+  editUrl: ({ version, versionDocsDirPath, docPath, locale }) => {
+if (locale != this.defaultLocale) {
+  return 
`https://github.com/apache/hudi/tree/asf-site/website/${versionDocsDirPath}/${docPath}`
+} else {
+  return 
`https://github.com/apache/hudi/tree/asf-site/website/i18n/${locale}/docusaurus-plugin-content-${versionDocsDirPath}/${version}/${docPath}`
+}
+  },
+  // type EditUrlFunction = (params: {
+  //   version: string;
+  //   versionDocsDirPath: string;
+  //   docPath: string;
+  //   permalink: string;
+  //   locale: string;
+  // }) => string | undefined;

Review comment:
   Could you remove these if not used?

##
File path: 
website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
##
@@ -1,58 +1,46 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your 
existing dataset into a Hudi dataset
+title: 迁移指南
+keywords: [ hudi, migration, use case, 迁移, 用例]
+summary: 在本页中,我们将讨论有效的工具,他们能将你的现有数据集迁移到 Hudi 数据集。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Hudi maintains metadata such as commit timeline and indexes to manage a 
dataset. The commit timelines helps to understand the actions happening on a 
dataset as well as the current state of a dataset. Indexes are used by Hudi to 
maintain a record key to file id mapping to efficiently locate a record. At the 
moment, Hudi supports writing only parquet columnar formats.
-To be able to start using Hudi for your existing dataset, you will need to 
migrate your existing dataset into a Hudi managed dataset. There are a couple 
of ways to achieve this.
+Hudi 维护了元数据,包括提交的时间线和索引,来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作,以及数据集的当前状态。索引则被 Hudi 
用来维护一个映射到文件 ID 的记录键,它能高效地定位一条记录。目前, Hudi 仅支持写 Parquet 列式格式 。
 
+为了在你的现有数据集上开始使用 Hudi ,你需要将你的现有数据集迁移到 Hudi 管理的数据集中。以下有多种方法实现这个目的。
 
-## Approaches
 
+## 方法
 
-### Use Hudi for new partitions alone
 
-Hudi can be used to manage an existing dataset without affecting/altering the 
historical data already present in the
-dataset. Hudi has been implemented to be compatible with such a mixed dataset 
with a caveat that either the complete
-Hive partition is Hudi managed or not. Thus the lowest granularity at which 
Hudi manages a dataset is a Hive
-partition. Start using the datasource API or the WriteClient to write to the 
dataset and make sure you start writing
-to a new partition or convert your last N partitions into Hudi instead of the 
entire table. Note, since the historical
- partitions are not managed by HUDI, none of the primitives provided by HUDI 
work on the data in those partitions. More concretely, one cannot perform 
upserts or incremental pull on such older partitions not managed by the HUDI 
dataset.
-Take this approach if your dataset is an append only type of dataset and you 
do not expect to perform any updates to existing (or non Hudi managed) 
partitions.
+### 将 Hudi 仅用于新分区
 
+Hudi 可以被用来在不影响/改变数据集历史数据的情况下管理一个现有的数据集。 Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 
Hudi 管理。因此, Hudi 管理一个数据集的最低粒度是一个 Hive 分区。使用数据源 API 或 WriteClient 
来写入数据集,并确保你开始写入的是一个新分区,或者将过去的 N 个分区而非整张表转换为 Hudi 。需要注意的是,由于历史分区不是由 Hudi 管理的, 
Hudi 提供的任何操作在那些分区上都不生效。更具体地说,无法在这些非 Hudi 管理的旧分区上进行插入更新或增量拉取。

Review comment:
   `Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。`
   -> `Hudi 已经实现兼容这样的数据集,需要注意的是,单个 Hive 分区要么完全由 Hudi 管理,要么不由 Hudi 管理。`

##
File path: 
website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
##
@@ -1,58 +1,46 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your 
existing dataset into a Hudi dataset

[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992099257


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992081147


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-992071081


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325)
 
   * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot commented on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-992091282


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2993) Support sink multiple tables with schema evolution into Hudi

2021-12-12 Thread Casel Chen (Jira)
Casel Chen created HUDI-2993:


 Summary: Support sink multiple tables with schema evolution into 
Hudi
 Key: HUDI-2993
 URL: https://issues.apache.org/jira/browse/HUDI-2993
 Project: Apache Hudi
  Issue Type: New Feature
  Components: Flink Integration
Reporter: Casel Chen


We have hundreds of OLTP tables that need to be synchronized to Hudi data Lake 
in real time. If we launch a synchronization job per table, resources and 
management will be a big challenge. Therefore, we eagerly looking for a FULL 
database synchronization tool, which can synchronize multiple tables to Hudi 
data lake in one job. At the same time, it is better to support schema 
evolution, because OLTP tables often modify the schema of tables.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992061874


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219)
 
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992081147


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992081197


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992075851


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


xiarixiaoyao commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992080922


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992075851


   
   ## CI report:
   
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992054747


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-992064349


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325)
 
   * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot commented on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-992071081


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325)
 
   * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4222)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


zhangyue19921010 commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r767386257



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.common.config.ConfigClassProperty;
+import org.apache.hudi.common.config.ConfigGroups;
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * The consistency guard relevant config options.
+ */
+@ConfigClassProperty(name = "FileSystem Guard Configurations",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "The filesystem guard related config options, to help 
deal with runtime exception like s3 list/get/put/delete performance issues.")
+public class FileSystemGuardConfig  extends HoodieConfig {

Review comment:
   Sure, "FileSystemRetryConfig" is more appropriate. Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


zhangyue19921010 commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r767386141



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java
##
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.fs.HoodieWrapperFileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.Random;
+
+public class RetryHelper {
+  private static final Logger LOG = LogManager.getLogger(RetryHelper.class);
+  private HoodieWrapperFileSystem.CheckedFunction func;
+  private int num;
+  private long maxIntervalTime;
+  private long initialIntervalTime = 100L;
+  private String taskInfo = "N/A";
+
+  public RetryHelper() {
+  }
+
+  public RetryHelper(String taskInfo) {
+this.taskInfo = taskInfo;
+  }
+
+  public RetryHelper tryWith(HoodieWrapperFileSystem.CheckedFunction func) {
+this.func = func;
+return this;
+  }
+
+  public RetryHelper tryNum(int num) {
+this.num = num;
+return this;
+  }
+
+  public RetryHelper tryTaskInfo(String taskInfo) {
+this.taskInfo = taskInfo;
+return this;
+  }
+
+  public RetryHelper tryMaxInterval(long time) {
+maxIntervalTime = time;
+return this;
+  }
+
+  public RetryHelper tryInitialInterval(long time) {
+initialIntervalTime = time;
+return this;
+  }
+
+  public T start() throws IOException {
+int retries = 0;
+boolean success = false;
+RuntimeException exception = null;
+T t = null;
+do {
+  long waitTime = Math.min(getWaitTimeExp(retries), maxIntervalTime);
+  try {
+t = func.get();
+success = true;
+break;
+  } catch (RuntimeException e) {
+// deal with RuntimeExceptions such like AmazonS3Exception 503
+exception = e;
+LOG.warn("Catch RuntimeException " + taskInfo + ", will retry after " 
+ waitTime + " ms.", e);
+try {
+  Thread.sleep(waitTime);
+} catch (InterruptedException ex) {
+// ignore InterruptedException here
+}
+retries++;
+  }
+} while (retries <= num);

Review comment:
   emmm, we only do `++` when caught exception, so maybe can't move it out 
of `catch() {}` block.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


zhangyue19921010 commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r767385839



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java
##
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.util;
+
+import org.apache.hudi.common.fs.HoodieWrapperFileSystem;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.Random;
+
+public class RetryHelper {
+  private static final Logger LOG = LogManager.getLogger(RetryHelper.class);
+  private HoodieWrapperFileSystem.CheckedFunction func;
+  private int num;
+  private long maxIntervalTime;
+  private long initialIntervalTime = 100L;
+  private String taskInfo = "N/A";
+
+  public RetryHelper() {
+  }
+
+  public RetryHelper(String taskInfo) {
+this.taskInfo = taskInfo;
+  }
+
+  public RetryHelper tryWith(HoodieWrapperFileSystem.CheckedFunction func) {
+this.func = func;
+return this;
+  }
+
+  public RetryHelper tryNum(int num) {
+this.num = num;
+return this;
+  }
+
+  public RetryHelper tryTaskInfo(String taskInfo) {
+this.taskInfo = taskInfo;
+return this;
+  }
+
+  public RetryHelper tryMaxInterval(long time) {
+maxIntervalTime = time;
+return this;
+  }
+
+  public RetryHelper tryInitialInterval(long time) {
+initialIntervalTime = time;
+return this;
+  }
+
+  public T start() throws IOException {
+int retries = 0;
+boolean success = false;
+RuntimeException exception = null;
+T t = null;

Review comment:
   Sure thing, changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


zhangyue19921010 commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r767385794



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.common.config.ConfigClassProperty;
+import org.apache.hudi.common.config.ConfigGroups;
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * The consistency guard relevant config options.
+ */
+@ConfigClassProperty(name = "FileSystem Guard Configurations",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "The filesystem guard related config options, to help 
deal with runtime exception like s3 list/get/put/delete performance issues.")
+public class FileSystemGuardConfig  extends HoodieConfig {
+
+  public static final ConfigProperty FILESYSTEM_RETRY_ENABLE = 
ConfigProperty
+  .key("hoodie.filesystem.action.retry.enabled")
+  .defaultValue("false")
+  .sinceVersion("0.10.0")
+  .withDocumentation("Enabled to handle S3 list/get/delete etc file system 
performance issue.");
+
+  public static final ConfigProperty INITIAL_RETRY_INTERVAL_MS = 
ConfigProperty
+  .key("hoodie.filesystem.action.retry.initial_interval_ms")
+  .defaultValue(100L)
+  .sinceVersion("0.10.0")
+  .withDocumentation("Amount of time (in ms) to wait, before retry to do 
operations on storage.");
+
+  public static final ConfigProperty MAX_RETRY_INTERVAL_MS = 
ConfigProperty
+  .key("hoodie.filesystem.action.retry.max_interval_ms")
+  .defaultValue(2000L)
+  .sinceVersion("0.10.0")
+  .withDocumentation("Maximum amount of time (in ms), to wait for next 
retry.");
+
+  public static final ConfigProperty MAX_RETRY_NUMBERS = 
ConfigProperty
+  .key("hoodie.filesystem.action.retry.max_numbers")

Review comment:
   We use `(long) Math.pow(2, retryCount) * initialIntervalTime + 
random.nextInt(100);` to calculate sleep time before each retry. And we may 
need `MAX_RETRY_INTERVAL_MS` to control the maximum duration of a single sleep 
in case sleep too long`Math.min(getWaitTimeExp(retries), maxIntervalTime)`. 
   
   Also use `MAX_RETRY_NUMBERS` to control max retry numbers to limit total 
retry time. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot commented on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-992064349


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325)
 
   * e314a3c3cbe9a90b4d5f72d2b46a157985288ea1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#issuecomment-966822668


   
   ## CI report:
   
   * 82ec7c1e3c40af686b9a4dcc5af99ebd3671913d UNKNOWN
   * fe0c868afdbc57efd8628c7380da7469e5108476 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3325)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3887: [HUDI-2648] Retry FileSystem action instead of failed directly.

2021-12-12 Thread GitBox


zhangyue19921010 commented on a change in pull request #3887:
URL: https://github.com/apache/hudi/pull/3887#discussion_r767384153



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/fs/FileSystemGuardConfig.java
##
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.fs;
+
+import org.apache.hudi.common.config.ConfigClassProperty;
+import org.apache.hudi.common.config.ConfigGroups;
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * The consistency guard relevant config options.
+ */
+@ConfigClassProperty(name = "FileSystem Guard Configurations",
+groupName = ConfigGroups.Names.WRITE_CLIENT,
+description = "The filesystem guard related config options, to help 
deal with runtime exception like s3 list/get/put/delete performance issues.")
+public class FileSystemGuardConfig  extends HoodieConfig {
+
+  public static final ConfigProperty FILESYSTEM_RETRY_ENABLE = 
ConfigProperty
+  .key("hoodie.filesystem.action.retry.enabled")

Review comment:
   Sure, changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992061874


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219)
 
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4221)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4291: [HUDI-2990] Delete partitions without metadata sync to hms

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4291:
URL: https://github.com/apache/hudi/pull/4291#issuecomment-992044866


   
   ## CI report:
   
   * ac71c00df089f959f3178eeb0c6db689f66c5737 UNKNOWN
   * cb41d556852651b47c2971a79f26b12e61ebcaed UNKNOWN
   * f5602d4c7e622973626effc61b831b36125234fd UNKNOWN
   * 550ba7889e0d4c553b5347f26c60e97a27844468 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4219)
 
   * 301d9ab65f3983ecf77b192d4af9401b8d60b059 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #4249: [SUPPORT]FLINK CDC WRITE HUDI, restart job get exception:org.apache.hudi.org.apache.avro.InvalidAvroMagicException: Not an Avro data file

2021-12-12 Thread GitBox


danny0405 commented on issue #4249:
URL: https://github.com/apache/hudi/issues/4249#issuecomment-992055023


   Can you try 0.10.0 please ? Seems has been fixed in the latest version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992054747


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4220)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992053877


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimal

2021-12-12 Thread GitBox


xiarixiaoyao commented on a change in pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#discussion_r767376871



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala
##
@@ -723,4 +723,26 @@ class TestCOWDataSource extends HoodieClientTestBase {
 val result = spark.sql("select * from tmptable limit 1").collect()(0)
 result.schema.contains(new StructField("partition", StringType, true))
   }
+
+  @Test
+  def testWriteSmallPrecisionDecimalTable(): Unit = {
+val records1 = recordsToStrings(dataGen.generateInserts("001", 5)).toList
+val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+  .withColumn("shortDecimal", lit(new java.math.BigDecimal(s"2090."))) 
// create decimalType(8, 4)
+inputDF1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val records2 = recordsToStrings(dataGen.generateUpdates("002", 5)).toList
+val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+  .withColumn("shortDecimal", lit(new java.math.BigDecimal(s"2090."))) 
// create decimalType(8, 4)
+inputDF2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
+  .mode(SaveMode.Append)
+  .save(basePath)
+assert(spark.read.format("hudi").load(basePath).count() == 5)

Review comment:
   yes, fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-992053877


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   * 893fe09af34779c0ef98b732a418c9ba941a2bfc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-12 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988825224


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >