[GitHub] [hudi] Neuw84 closed pull request #5463: [HUDI-3994] - Added support for initializing DeltaStreamer without a defined Spark Master

2022-05-18 Thread GitBox


Neuw84 closed pull request #5463: [HUDI-3994] - Added support for initializing 
DeltaStreamer without a defined Spark Master
URL: https://github.com/apache/hudi/pull/5463


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5535: [HUDI-4062] Only rollback the failed writes pre upgrade under optimis…

2022-05-18 Thread GitBox


hudi-bot commented on PR #5535:
URL: https://github.com/apache/hudi/pull/5535#issuecomment-1131232070

   
   ## CI report:
   
   * 30ac93b9612baba56303ff095642609facc37c55 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131212873

   
   ## CI report:
   
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 00d5fed1954348b749859f8f81fec593422df774 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #5617: [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…

2022-05-18 Thread GitBox


danny0405 commented on PR #5617:
URL: https://github.com/apache/hudi/pull/5617#issuecomment-1131196973

   > @danny0405 : unless it is a hot fix, or some test related changes, can we 
not merge it in w/o getting a stamp from someone. It is a good practice to 
follow as to not miss out anything.
   
   Oops, i'm sorry, would follow that in following PRs.
   I did ping you on Slack but no response yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4119) the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

2022-05-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4119:
-
Labels: pull-request-available  (was: )

> the first read result is incorrect  when Flink upsert- Kafka connector is 
> used in  HUDi 
> 
>
> Key: HUDI-4119
> URL: https://issues.apache.org/jira/browse/HUDI-4119
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: yanxiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
>  the first read result is incorrect  when Flink upsert- Kafka connector is 
> used in  HUDi .
>  
>  ETL  path: flink upsert-kafka connector -> hudi table (MOR table,query by 
> stream)
>  
> Here is the case:
>  
> 1. the first time: write two records  with the same primary key into kafka, 
> and  insert them into hudi table. the query result should be three records: 
> +I first record, -U first record, +U second record; But the first time I 
> query hudi table, I found that all the data operation were +I: +I first 
> record,+I first record and +I second record, and there was no update 
> operation; 
>  Three times +I has affected hudi's subsequent ETL process-the data of  
> groupBy is inaccurate; 
> 2. Second time: Exit the first query, restart the query job of hudi table, 
> and the query results are normal: +I first data, -U first data, +U second 
> data.
>  
> Reason:
> Reason:There is a bug in the program. When no data log file is generated, the 
> Schema does not include the column' _ hoodie _ operation'.Please refer to the 
> following link for details:
> [https://www.jianshu.com/p/29f9ec5e606e]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] hudi-bot commented on pull request #5626: [HUDI-4119] the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

2022-05-18 Thread GitBox


hudi-bot commented on PR #5626:
URL: https://github.com/apache/hudi/pull/5626#issuecomment-1131186033

   
   ## CI report:
   
   * 284ce7503ad459a635148b0761bb3a5ebc9b9de6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amit-ranjan-de commented on issue #5599: [SUPPORT] File names in S3 do not match the file names in the latest .commit file

2022-05-18 Thread GitBox


amit-ranjan-de commented on issue #5599:
URL: https://github.com/apache/hudi/issues/5599#issuecomment-1131174129

   Hi @xushiyan thanks for your comment!
   
   We don't have any TTL on the S3 bucket.
   
   For the command:
   ```
   aws s3api get-bucket-lifecycle --bucket 
   ```
   
   We receive below result:
   ```
   {
   "Rules": [
   {
   "ID": "intelligent-tiering",
   "Status": "Enabled",
   "Transition": {
   "Days": 0,
   "StorageClass": "INTELLIGENT_TIERING"
   }
   },
   {
   "ID": "expire-noncurrent",
   "Status": "Enabled",
   "NoncurrentVersionExpiration": {
   "NoncurrentDays": 30
   }
   }
   ]
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #5617: [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…

2022-05-18 Thread GitBox


nsivabalan commented on code in PR #5617:
URL: https://github.com/apache/hudi/pull/5617#discussion_r876571323


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -1466,8 +1466,6 @@ protected final HoodieTable initTable(WriteOperationType 
operationType, Optionhttps://github.com/apache/hudi/pull/4739 Do you happened to know the reason? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5617: [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…

2022-05-18 Thread GitBox


nsivabalan commented on PR #5617:
URL: https://github.com/apache/hudi/pull/5617#issuecomment-1131172942

   @danny0405 : unless it is a hot fix, or some test related changes, can we 
not merge it in w/o getting a stamp from someone. 
   It is a good practice to follow as to not miss out anything. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5535: [HUDI-4062] Only rollback the failed writes pre upgrade under optimis…

2022-05-18 Thread GitBox


hudi-bot commented on PR #5535:
URL: https://github.com/apache/hudi/pull/5535#issuecomment-1131166306

   
   ## CI report:
   
   * c4be16709667c4b2615f37eed1859cd384b93f32 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8518)
 
   * 30ac93b9612baba56303ff095642609facc37c55 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131163451

   
   ## CI report:
   
   * c34256ac98d03c787d264e56e35a7058d4273442 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8755)
 
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 00d5fed1954348b749859f8f81fec593422df774 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5535: [HUDI-4062] Only rollback the failed writes pre upgrade under optimis…

2022-05-18 Thread GitBox


hudi-bot commented on PR #5535:
URL: https://github.com/apache/hudi/pull/5535#issuecomment-1131163097

   
   ## CI report:
   
   * c4be16709667c4b2615f37eed1859cd384b93f32 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8518)
 
   * 30ac93b9612baba56303ff095642609facc37c55 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5535: [HUDI-4062] Only rollback the failed writes pre upgrade under optimis…

2022-05-18 Thread GitBox


danny0405 commented on code in PR #5535:
URL: https://github.com/apache/hudi/pull/5535#discussion_r876565034


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java:
##
@@ -1558,13 +1558,15 @@ private void tryUpgrade(HoodieTableMetaClient 
metaClient, Option instant
 new UpgradeDowngrade(metaClient, config, context, 
upgradeDowngradeHelper);
 
 if 
(upgradeDowngrade.needsUpgradeOrDowngrade(HoodieTableVersion.current())) {
-  // Ensure no inflight commits by setting EAGER policy and explicitly 
cleaning all failed commits
-  List instantsToRollback = getInstantsToRollback(metaClient, 
HoodieFailedWritesCleaningPolicy.EAGER, instantTime);
+  if 
(config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+// Ensure no inflight commits by setting EAGER policy and explicitly 
cleaning all failed commits
+List instantsToRollback = getInstantsToRollback(metaClient, 
HoodieFailedWritesCleaningPolicy.EAGER, instantTime);
 
-  Map> pendingRollbacks = 
getPendingRollbackInfos(metaClient);
-  instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, 
Option.empty()));
+Map> pendingRollbacks = 
getPendingRollbackInfos(metaClient);
+instantsToRollback.forEach(entry -> 
pendingRollbacks.putIfAbsent(entry, Option.empty()));

Review Comment:
   Hello @nsivabalan @n3nash , i have refactored the code a little to
   
   1. move the failed writes cleaning all together to when starting the new 
commit
   2. the cleaning strategy for commit actions as tweaked: force to clean if 
metadata table is enabled in case we commit metadata table successfully but 
data set table failed
   
   Please review again if you have time, thanks so much in advance ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131139052

   
   ## CI report:
   
   * c34256ac98d03c787d264e56e35a7058d4273442 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8755)
 
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 00d5fed1954348b749859f8f81fec593422df774 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-4114) Remove the unnecessary fs view sync for BaseWriteClient#initTable

2022-05-18 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-4114.
--

> Remove the unnecessary fs view sync for BaseWriteClient#initTable
> -
>
> Key: HUDI-4114
> URL: https://issues.apache.org/jira/browse/HUDI-4114
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] zhilinli123 commented on issue #4881: Full incremental Enable index loading to discover duplicate data(index.bootstrap.enabled)

2022-05-18 Thread GitBox


zhilinli123 commented on issue #4881:
URL: https://github.com/apache/hudi/issues/4881#issuecomment-1131081185

   > @zhilinli123 @danny0405 : is there any follows pending on this issue. If 
not, can we please close it out.
   
   This problem should still exist, community leaders can help confirm this 
problem when they are free
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (6573469e73 -> 6f37863ba8)

2022-05-18 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6573469e73 [HUDI-4116] Unify clustering/compaction related procedures' 
output type (#5620)
 add 6f37863ba8 [HUDI-4114] Remove the unnecessary fs view sync for 
BaseWriteClient#initTable (#5617)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java | 2 --
 1 file changed, 2 deletions(-)



[GitHub] [hudi] danny0405 merged pull request #5617: [HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…

2022-05-18 Thread GitBox


danny0405 merged PR #5617:
URL: https://github.com/apache/hudi/pull/5617


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5626: the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

2022-05-18 Thread GitBox


hudi-bot commented on PR #5626:
URL: https://github.com/apache/hudi/pull/5626#issuecomment-1131058558

   
   ## CI report:
   
   * 422c808939ad98051f4b83bb2353191167b6edb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8740)
 
   * 284ce7503ad459a635148b0761bb3a5ebc9b9de6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131053451

   
   ## CI report:
   
   * c34256ac98d03c787d264e56e35a7058d4273442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8755)
 
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   * 00d5fed1954348b749859f8f81fec593422df774 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5626: the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

2022-05-18 Thread GitBox


hudi-bot commented on PR #5626:
URL: https://github.com/apache/hudi/pull/5626#issuecomment-1131053372

   
   ## CI report:
   
   * 422c808939ad98051f4b83bb2353191167b6edb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8740)
 
   * 284ce7503ad459a635148b0761bb3a5ebc9b9de6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131042547

   
   ## CI report:
   
   * c34256ac98d03c787d264e56e35a7058d4273442 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8755)
 
   * d0f078159313f8b35a41b1d1e016583204811383 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1131037264

   
   ## CI report:
   
   * c34256ac98d03c787d264e56e35a7058d4273442 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3384) Implement Spark-specific FileWriters

2022-05-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3384:
-
Labels: pull-request-available  (was: )

> Implement Spark-specific FileWriters
> 
>
> Key: HUDI-3384
> URL: https://issues.apache.org/jira/browse/HUDI-3384
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> As per RFC-46
> `HoodieFileWriter`s will be
>  # Accepting `HoodieRecord`
>  # Will be engine-specific (so that they're able to handle internal record 
> representation)
>  
> Initially, we will focus on Spark with other engines to follow.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] minihippo opened a new pull request, #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-05-18 Thread GitBox


minihippo opened a new pull request, #5629:
URL: https://github.com/apache/hudi/pull/5629

   ## What is the purpose of the pull request
   
   RFC-46 spark specific file reader/writer based on internal row
   
   ## Brief change log
   
   add spark file reader of parquet/orc/HFile
   add spark file writer of parquet/orc/HFile
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liujinhui1994 closed pull request #5618: [HUDI-3555] Re-use spark config for parquet timestamp format

2022-05-18 Thread GitBox


liujinhui1994 closed pull request #5618: [HUDI-3555]  Re-use spark config for 
parquet timestamp format 
URL: https://github.com/apache/hudi/pull/5618


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-4117) Remove timeout rollback for flink compaction

2022-05-18 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-4117.
--

> Remove timeout rollback for flink compaction
> 
>
> Key: HUDI-4117
> URL: https://issues.apache.org/jira/browse/HUDI-4117
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.11.0
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.11.1, 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-4117) Remove timeout rollback for flink compaction

2022-05-18 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539165#comment-17539165
 ] 

Danny Chen commented on HUDI-4117:
--

Fixed via master branch: 551aa959c57721a5cc4d3f63f79e0201978980a2

> Remove timeout rollback for flink compaction
> 
>
> Key: HUDI-4117
> URL: https://issues.apache.org/jira/browse/HUDI-4117
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.11.0
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.11.1, 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HUDI-4101) BucketIndexPartitioner should take partition path for better dispersion

2022-05-18 Thread Forward Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Forward Xu resolved HUDI-4101.
--

> BucketIndexPartitioner should take partition path for better dispersion
> ---
>
> Key: HUDI-4101
> URL: https://issues.apache.org/jira/browse/HUDI-4101
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.1, 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HUDI-4101) BucketIndexPartitioner should take partition path for better dispersion

2022-05-18 Thread Forward Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Forward Xu reassigned HUDI-4101:


Assignee: Danny Chen

> BucketIndexPartitioner should take partition path for better dispersion
> ---
>
> Key: HUDI-4101
> URL: https://issues.apache.org/jira/browse/HUDI-4101
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.1, 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[hudi] branch master updated: [HUDI-4116] Unify clustering/compaction related procedures' output type (#5620)

2022-05-18 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6573469e73 [HUDI-4116] Unify clustering/compaction related procedures' 
output type (#5620)
6573469e73 is described below

commit 6573469e73ea51ed6d1c24504e8be5abfa91c642
Author: huberylee 
AuthorDate: Thu May 19 09:48:03 2022 +0800

[HUDI-4116] Unify clustering/compaction related procedures' output type 
(#5620)

* Unify clustering/compaction related procedures' output type

* Address review comments
---
 .../scala/org/apache/hudi/HoodieCLIUtils.scala |  15 ++-
 .../hudi/command/CompactionHoodiePathCommand.scala |  11 +--
 .../command/CompactionHoodieTableCommand.scala |  13 +--
 .../command/CompactionShowHoodiePathCommand.scala  |  12 +--
 .../command/CompactionShowHoodieTableCommand.scala |  12 +--
 .../procedures/RunClusteringProcedure.scala|  34 ++-
 .../procedures/RunCompactionProcedure.scala|  29 --
 .../procedures/ShowClusteringProcedure.scala   |  37 ++--
 .../procedures/ShowCompactionProcedure.scala   |  16 ++--
 .../hudi/procedure/TestClusteringProcedure.scala   | 103 +++--
 .../hudi/procedure/TestCompactionProcedure.scala   |  78 
 11 files changed, 247 insertions(+), 113 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCLIUtils.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCLIUtils.scala
index 58c3324823..552e3cfc9b 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCLIUtils.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCLIUtils.scala
@@ -19,14 +19,14 @@
 
 package org.apache.hudi
 
+import org.apache.hudi.avro.model.HoodieClusteringGroup
 import org.apache.hudi.client.SparkRDDWriteClient
 import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
 import org.apache.spark.api.java.JavaSparkContext
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.hudi.HoodieSqlCommonUtils.withSparkConf
 
-import scala.collection.JavaConverters.mapAsJavaMapConverter
-import scala.collection.immutable.Map
+import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, 
mapAsJavaMapConverter}
 
 object HoodieCLIUtils {
 
@@ -46,4 +46,15 @@ object HoodieCLIUtils {
 DataSourceUtils.createHoodieClient(jsc, schemaStr, basePath,
   metaClient.getTableConfig.getTableName, finalParameters.asJava)
   }
+
+  def extractPartitions(clusteringGroups: Seq[HoodieClusteringGroup]): String 
= {
+var partitionPaths: Seq[String] = Seq.empty
+clusteringGroups.foreach(g =>
+  g.getSlices.asScala.foreach(slice =>
+partitionPaths = partitionPaths :+ slice.getPartitionPath
+  )
+)
+
+partitionPaths.sorted.mkString(",")
+  }
 }
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodiePathCommand.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodiePathCommand.scala
index 5b513f7500..57aff092b7 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodiePathCommand.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodiePathCommand.scala
@@ -19,11 +19,9 @@ package org.apache.spark.sql.hudi.command
 
 import org.apache.hudi.common.model.HoodieTableType
 import org.apache.hudi.common.table.HoodieTableMetaClient
-
-import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.Attribute
 import 
org.apache.spark.sql.catalyst.plans.logical.CompactionOperation.{CompactionOperation,
 RUN, SCHEDULE}
 import org.apache.spark.sql.hudi.command.procedures.{HoodieProcedureUtils, 
RunCompactionProcedure}
-import org.apache.spark.sql.types.StringType
 import org.apache.spark.sql.{Row, SparkSession}
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -50,10 +48,5 @@ case class CompactionHoodiePathCommand(path: String,
 RunCompactionProcedure.builder.get().build.call(procedureArgs)
   }
 
-  override val output: Seq[Attribute] = {
-operation match {
-  case RUN => Seq.empty
-  case SCHEDULE => Seq(AttributeReference("instant", StringType, nullable 
= false)())
-}
-  }
+  override val output: Seq[Attribute] = 
RunCompactionProcedure.builder.get().build.outputType.toAttributes
 }
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodieTableCommand.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CompactionHoodieTableCommand.scala
index 5e3623

[GitHub] [hudi] XuQianJin-Stars merged pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


XuQianJin-Stars merged PR #5620:
URL: https://github.com/apache/hudi/pull/5620


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-4121) The java client missing some supports on the conflict handling?

2022-05-18 Thread Yong Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539158#comment-17539158
 ] 

Yong Zhang commented on HUDI-4121:
--

I found in the SparkRDDWriteClient, it implements preCommit 
[https://github.com/apache/hudi/blob/551aa959c57721a5cc4d3f63f79e0201978980a2/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java#L471.]

But the java client doesn't do similar things

> The java client missing some supports on the conflict handling?
> ---
>
> Key: HUDI-4121
> URL: https://issues.apache.org/jira/browse/HUDI-4121
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yong Zhang
>Priority: Major
>
> When I enable the concurrency in the hudi java writer, it looks like 
> something is wrong when committing at the same time.
>  
> The exception:
>  
> ```
>  
> {{org.apache.hudi.exception.HoodieIOException: Failed to create file 
> file:/tmp/integration/hudi/.hoodie/20220517094051766.commit
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:745)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:560)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:536)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:183)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:270)
>  ~[hudi-client-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:234)
>  ~[hudi-client-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:88)
>  ~[hudi-java-client-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:51)
>  ~[hudi-java-client-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:206)
>  ~[hudi-client-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.pulsar.ecosystem.io.sink.hudi.BufferedConnectWriter.flushRecords(BufferedConnectWriter.java:82)
>  ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
>   at 
> org.apache.pulsar.ecosystem.io.sink.hudi.HoodieWriter.flush(HoodieWriter.java:85)
>  ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
>   at 
> org.apache.pulsar.ecosystem.io.sink.SinkWriter.commitIfNeed(SinkWriter.java:128)
>  ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
>   at 
> org.apache.pulsar.ecosystem.io.sink.SinkWriter.run(SinkWriter.java:113) 
> [PqY5lYEJSWPWMDq7E5HC2Q/:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_201]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_201]
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-common-4.1.77.Final.jar:4.1.77.Final]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists: file:/tmp/integration/hudi/.hoodie/20220517094051766.commit
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:315) 
> ~[hadoop-common-3.2.2.jar:?]
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353) 
> ~[hadoop-common-3.2.2.jar:?]
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:403)
>  ~[hadoop-common-3.2.2.jar:?]
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466) 
> ~[hadoop-common-3.2.2.jar:?]
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445) 
> ~[hadoop-common-3.2.2.jar:?]
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) 
> ~[hadoop-common-3.2.2.jar:?]
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105) 
> ~[hadoop-common-3.2.2.jar:?]
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:994) 
> ~[hadoop-common-3.2.2.jar:?]
>   at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:222)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:101)
>  ~[hudi-common-0.11.0.jar:0.11.0]
>   at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:221)
>  ~[hudi-c

[jira] [Created] (HUDI-4121) The java client missing some supports on the conflict handling?

2022-05-18 Thread Yong Zhang (Jira)
Yong Zhang created HUDI-4121:


 Summary: The java client missing some supports on the conflict 
handling?
 Key: HUDI-4121
 URL: https://issues.apache.org/jira/browse/HUDI-4121
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Yong Zhang


When I enable the concurrency in the hudi java writer, it looks like something 
is wrong when committing at the same time.

 

The exception:

 

```
 
{{org.apache.hudi.exception.HoodieIOException: Failed to create file 
file:/tmp/integration/hudi/.hoodie/20220517094051766.commit
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:745)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:560)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:536)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:183)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:270)
 ~[hudi-client-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:234)
 ~[hudi-client-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:88)
 ~[hudi-java-client-0.11.0.jar:0.11.0]
at 
org.apache.hudi.client.HoodieJavaWriteClient.commit(HoodieJavaWriteClient.java:51)
 ~[hudi-java-client-0.11.0.jar:0.11.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:206)
 ~[hudi-client-common-0.11.0.jar:0.11.0]
at 
org.apache.pulsar.ecosystem.io.sink.hudi.BufferedConnectWriter.flushRecords(BufferedConnectWriter.java:82)
 ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
at 
org.apache.pulsar.ecosystem.io.sink.hudi.HoodieWriter.flush(HoodieWriter.java:85)
 ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
at 
org.apache.pulsar.ecosystem.io.sink.SinkWriter.commitIfNeed(SinkWriter.java:128)
 ~[PqY5lYEJSWPWMDq7E5HC2Q/:?]
at 
org.apache.pulsar.ecosystem.io.sink.SinkWriter.run(SinkWriter.java:113) 
[PqY5lYEJSWPWMDq7E5HC2Q/:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_201]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_201]
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [netty-common-4.1.77.Final.jar:4.1.77.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists: file:/tmp/integration/hudi/.hoodie/20220517094051766.commit
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:315) 
~[hadoop-common-3.2.2.jar:?]
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:353) 
~[hadoop-common-3.2.2.jar:?]
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:403)
 ~[hadoop-common-3.2.2.jar:?]
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:466) 
~[hadoop-common-3.2.2.jar:?]
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:445) 
~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125) 
~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105) 
~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:994) 
~[hadoop-common-3.2.2.jar:?]
at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:222)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:101)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:221)
 ~[hudi-common-0.11.0.jar:0.11.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:740)
 ~[hudi-common-0.11.0.jar:0.11.0]
... 16 more}}

```

 

And my hudi writer configuration:

```
 
{{"hoodie.table.name": "hudi-connector-test",
"hoodie.table.type": "COPY_ON_WRITE",
"hoodie.base.path": "file:///tmp/integration/hudi",
"hoodie.clean.async": "true",
"hoodie.write.concurrency.mode": "optimistic_concurrency_control",
"hoodie.cleaner.policy.failed.writes": "LAZY",
"hoodie.write.lock.provider": 
"org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider",
"

[jira] [Resolved] (HUDI-4080) Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-4080.
--

> Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform
> --
>
> Key: HUDI-4080
> URL: https://issues.apache.org/jira/browse/HUDI-4080
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.12.0
>
>
> Get a working run of Hudi Sink Connector for Kafka Connect on Confluent 
> Platform which set up all necessary components, including Kafka broker, 
> schema registry, zookeeper, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4080) Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4080:
-
Status: In Progress  (was: Open)

> Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform
> --
>
> Key: HUDI-4080
> URL: https://issues.apache.org/jira/browse/HUDI-4080
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.12.0
>
>
> Get a working run of Hudi Sink Connector for Kafka Connect on Confluent 
> Platform which set up all necessary components, including Kafka broker, 
> schema registry, zookeeper, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4080) Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4080:
-
Status: Patch Available  (was: In Progress)

> Verify the Hudi Sink Connector for Kafka Connect on Confluent Platform
> --
>
> Key: HUDI-4080
> URL: https://issues.apache.org/jira/browse/HUDI-4080
> Project: Apache Hudi
>  Issue Type: Task
>  Components: kafka-connect
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.12.0
>
>
> Get a working run of Hudi Sink Connector for Kafka Connect on Confluent 
> Platform which set up all necessary components, including Kafka broker, 
> schema registry, zookeeper, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-4049) Upgrade Hudi version in the connector

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4049:
-
Flagged: Impediment

> Upgrade Hudi version in the connector
> -
>
> Key: HUDI-4049
> URL: https://issues.apache.org/jira/browse/HUDI-4049
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] xushiyan commented on issue #4223: [SUPPORT] Flink hudi Sink

2022-05-18 Thread GitBox


xushiyan commented on issue #4223:
URL: https://github.com/apache/hudi/issues/4223#issuecomment-1130820041

   close due to inactivity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #4223: [SUPPORT] Flink hudi Sink

2022-05-18 Thread GitBox


xushiyan closed issue #4223: [SUPPORT] Flink hudi Sink
URL: https://github.com/apache/hudi/issues/4223


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4993: [SUPPORT] Flink Streaming read about dynamic day partition

2022-05-18 Thread GitBox


xushiyan commented on issue #4993:
URL: https://github.com/apache/hudi/issues/4993#issuecomment-1130801236

   @BruceKellan not very clear what the issue is. what do you mean by this?
   
   > In next day, dwd_data's max time was '2022-03-08 23:59:59.000'.
   It seem that it cannot read new data in day=2022-03-09


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5430: [HUDI-3979][Stacked on 5428] Optimize out mandatory columns when no merging is performed

2022-05-18 Thread GitBox


hudi-bot commented on PR #5430:
URL: https://github.com/apache/hudi/pull/5430#issuecomment-1130738364

   
   ## CI report:
   
   * 5b241061bde4ca74684f07677c7f5afa828e269c UNKNOWN
   * f2e15cb0a9d8cef06be2321b1346bf07e5bf6d7b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5428: [HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations

2022-05-18 Thread GitBox


hudi-bot commented on PR #5428:
URL: https://github.com/apache/hudi/pull/5428#issuecomment-1130722086

   
   ## CI report:
   
   * a0b466cb8e3f166c716f5776ac4c80f778dd9936 UNKNOWN
   * 5fb2606d22ec629b8def68ed348f84cbc71c1eeb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8752)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5430: [HUDI-3979][Stacked on 5428] Optimize out mandatory columns when no merging is performed

2022-05-18 Thread GitBox


hudi-bot commented on PR #5430:
URL: https://github.com/apache/hudi/pull/5430#issuecomment-1130657196

   
   ## CI report:
   
   * 5b241061bde4ca74684f07677c7f5afa828e269c UNKNOWN
   * ddb01d5b6079cadc00947f2c8dd861632434776b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8335)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8336)
 
   * f2e15cb0a9d8cef06be2321b1346bf07e5bf6d7b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5428: [HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations

2022-05-18 Thread GitBox


hudi-bot commented on PR #5428:
URL: https://github.com/apache/hudi/pull/5428#issuecomment-1130635108

   
   ## CI report:
   
   * a0b466cb8e3f166c716f5776ac4c80f778dd9936 UNKNOWN
   * 381d2614c89dccd959934a8ff3269085b48983ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8334)
 
   * 5fb2606d22ec629b8def68ed348f84cbc71c1eeb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8752)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5430: [HUDI-3979][Stacked on 5428] Optimize out mandatory columns when no merging is performed

2022-05-18 Thread GitBox


hudi-bot commented on PR #5430:
URL: https://github.com/apache/hudi/pull/5430#issuecomment-1130629292

   
   ## CI report:
   
   * 5b241061bde4ca74684f07677c7f5afa828e269c UNKNOWN
   * ddb01d5b6079cadc00947f2c8dd861632434776b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8335)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8336)
 
   * f2e15cb0a9d8cef06be2321b1346bf07e5bf6d7b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5428: [HUDI-3896] Porting Nested Schema Pruning optimization for Hudi's custom Relations

2022-05-18 Thread GitBox


hudi-bot commented on PR #5428:
URL: https://github.com/apache/hudi/pull/5428#issuecomment-1130629233

   
   ## CI report:
   
   * a0b466cb8e3f166c716f5776ac4c80f778dd9936 UNKNOWN
   * 381d2614c89dccd959934a8ff3269085b48983ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8334)
 
   * 5fb2606d22ec629b8def68ed348f84cbc71c1eeb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jinyius commented on pull request #5445: [HUDI-3953]Flink Hudi module should support low-level source and sink…

2022-05-18 Thread GitBox


jinyius commented on PR #5445:
URL: https://github.com/apache/hudi/pull/5445#issuecomment-1130603396

   it looks like the source and sink surfaced by this change still operates at 
the RowData flink datatype instead of being able to use other types.  will hudi 
generally only support RowData only?  or is the plan to make more inroads to 
flink's datastream layer implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on pull request #5129: [HUDI-3709] Fixing `ParquetWriter` impls not respecting Parquet Max File Size limit

2022-05-18 Thread GitBox


alexeykudinkin commented on PR #5129:
URL: https://github.com/apache/hudi/pull/5129#issuecomment-1130488173

   @nsivabalan we should not be interfering with the caching on the Parquet 
Writer level (by manually flushing), and checking the ParquetWriter for the 
currently accumulated buffer size is the right way to interface with it (as 
compared to intercept the  FileSystem writes and accounting for how many bytes 
were written).
   
   The issue inadvertently planted with this approach (addressed in #5497) was 
that the cost of the `getDataSize` was not factored in (assumed it to be O(1), 
while in reality it's O(N) of the written blocks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sstimmel commented on issue #5628: [SUPPORT] - Deltastreamer not shutting down properly

2022-05-18 Thread GitBox


sstimmel commented on issue #5628:
URL: https://github.com/apache/hudi/issues/5628#issuecomment-1130390415

   with less debugging,
   
   `22/05/18 16:03:52 INFO HoodieActiveTimeline: Loaded instants upto : 
Option{val=[20220518160348871__clean__COMPLETED]}
   22/05/18 16:03:52 INFO HoodieTimelineArchiveLog: No Instants to archive
   22/05/18 16:03:52 INFO AbstractHoodieWriteClient: Committed 20220518160141704
   22/05/18 16:03:52 INFO MapPartitionsRDD: Removing RDD 72 from persistence 
list
   22/05/18 16:03:52 INFO BlockManager: Removing RDD 72
   22/05/18 16:03:52 INFO MapPartitionsRDD: Removing RDD 57 from persistence 
list
   22/05/18 16:03:52 INFO BlockManager: Removing RDD 57
   22/05/18 16:03:52 INFO DeltaSync: Commit 20220518160141704 successful!
   22/05/18 16:03:52 INFO DeltaSync: Shutting down embedded timeline server
   22/05/18 16:03:52 INFO EmbeddedTimelineService: Closing Timeline server
   22/05/18 16:03:52 INFO TimelineService: Closing Timeline Service
   22/05/18 16:03:52 INFO Javalin: Stopping Javalin ...
   22/05/18 16:03:52 INFO Javalin: Javalin has stopped
   22/05/18 16:03:52 INFO TimelineService: Closed Timeline Service
   22/05/18 16:03:52 INFO EmbeddedTimelineService: Closed Timeline server
   22/05/18 16:03:52 INFO HoodieDeltaStreamer: Shut down delta streamer
   22/05/18 16:03:52 INFO SparkUI: Stopped Spark web UI at 
http://tenantconfig-hudi-incr-table-job:4040
   22/05/18 16:03:52 INFO KubernetesClusterSchedulerBackend: Shutting down all 
executors
   22/05/18 16:03:52 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
   22/05/18 16:03:52 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
has been closed.
   22/05/18 16:03:53 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
   22/05/18 16:03:53 INFO MemoryStore: MemoryStore cleared
   22/05/18 16:03:53 INFO BlockManager: BlockManager stopped
   22/05/18 16:03:53 INFO BlockManagerMaster: BlockManagerMaster stopped
   22/05/18 16:03:53 INFO 
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
   22/05/18 16:03:53 INFO SparkContext: Successfully stopped SparkContext
   22/05/18 16:04:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:05:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:06:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:07:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:08:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:09:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:10:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:11:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:12:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:13:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:14:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:15:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:16:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:17:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:18:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:19:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:20:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:21:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:22:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:23:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:24:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:25:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:26:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:27:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:28:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:29:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:30:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:31:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.
   22/05/18 16:32:26 INFO CloudWatchReporter: Reporting Metrics to CloudWatch.`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sstimmel opened a new issue, #5628: [SUPPORT] - Deltastreamer not shutting down properly

2022-05-18 Thread GitBox


sstimmel opened a new issue, #5628:
URL: https://github.com/apache/hudi/issues/5628

   Running Deltastreamer with Cloudwatch Metrics isn't shutting down properly.  
This is in NON continous mode.  DeltaSync and spark context say they are 
closing, but the JVM is not exiting, everything seems to be in a waiting/hung 
state except for Cloudwatch metrics which still tries to send metrics every 
minute.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Run Deltastreamer in non-continuous mode, with a metrics provider (only 
tested with Cloudwatch).  
   
   
   **Expected behavior**
   
   Expect that everything shutsdown and the JVM exits.   Deltastreamer runs a 
single sync, but the JVM never fully exits, so kubernetes doesn't complete the 
job.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 2.12
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :  S3
   
   * Running on Docker? (yes/no) : yes (Kubernetes EKS job)
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```22/05/18 18:37:25 DEBUG PoolingHttpClientConnectionManager: Connection 
[id: 27][route: {s}->https://sqs.us-east-1.amazonaws.com:443] can be kept alive 
for 60.0 seconds
   22/05/18 18:37:25 DEBUG DefaultManagedHttpClientConnection: 
http-outgoing-27: set socket timeout to 0
   22/05/18 18:37:25 DEBUG PoolingHttpClientConnectionManager: Connection 
released: [id: 27][route: {s}->https://sqs.us-east-1.amazonaws.com:443][total 
available: 1; route allocated: 1 of 50; total allocated: 1 of 50]
   22/05/18 18:37:25 DEBUG request: Received successful response: 200, AWS 
Request ID: 354cb537-78f6-5937-87a1-efc847d357ff
   22/05/18 18:37:25 DEBUG requestId: x-amzn-RequestId: 
354cb537-78f6-5937-87a1-efc847d357ff
   22/05/18 18:37:25 INFO CloudObjectsSelector: Successfully deleted 2 messages 
from queue.
   22/05/18 18:37:25 INFO DeltaSync: Shutting down embedded timeline server
   22/05/18 18:37:25 INFO EmbeddedTimelineService: Closing Timeline server
   22/05/18 18:37:25 INFO TimelineService: Closing Timeline Service
   22/05/18 18:37:25 INFO Javalin: Stopping Javalin ...
   22/05/18 18:37:25 DEBUG AbstractEndPoint: close 
SocketChannelEndPoint@6df6feed{/10.21.162.96:49092<->/10.21.162.96:36215,OPEN,fill=FI,flush=-,to=3643/3}{io=1/1,kio=1,kro=1}->HttpConnection@553bdf11[p=HttpParser{s=START,0
 of 
-1},g=HttpGenerator@fe052cd{s=START}]=>HttpChannelOverHttp@6207e95f{r=4,c=false,c=false/false,a=IDLE,uri=null,age=0}
   22/05/18 18:37:25 DEBUG AbstractEndPoint: close(null) 
SocketChannelEndPoint@6df6feed{/10.21.162.96:49092<->/10.21.162.96:36215,OPEN,fill=FI,flush=-,to=3643/3}{io=1/1,kio=1,kro=1}->HttpConnection@553bdf11[p=HttpParser{s=START,0
 of 
-1},g=HttpGenerator@fe052cd{s=START}]=>HttpChannelOverHttp@6207e95f{r=4,c=false,c=false/false,a=IDLE,uri=null,age=0}
   22/05/18 18:37:25 DEBUG ChannelEndPoint: doClose 
SocketChannelEndPoint@6df6feed{/10.21.162.96:49092<->/10.21.162.96:36215,CLOSED,fill=FI,flush=-,to=3643/3}{io=1/1,kio=1,kro=1}->HttpConnection@553bdf11[p=HttpParser{s=START,0
 of 
-1},g=HttpGenerator@fe052cd{s=START}]=>HttpChannelOverHttp@6207e95f{r=4,c=false,c=false/false,a=IDLE,uri=null,age=0}
   22/05/18 18:37:25 DEBUG WriteFlusher: ignored: 
WriteFlusher@19b123e1{IDLE}->null
   java.nio.channels.ClosedChannelException
at 
org.apache.hudi.org.apache.jetty.io.WriteFlusher.onClose(WriteFlusher.java:502)
at 
org.apache.hudi.org.apache.jetty.io.AbstractEndPoint.onClose(AbstractEndPoint.java:353)
at 
org.apache.hudi.org.apache.jetty.io.ChannelEndPoint.onClose(ChannelEndPoint.java:215)
at 
org.apache.hudi.org.apache.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:225)
at 
org.apache.hudi.org.apache.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at 
org.apache.hudi.org.apache.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at 
org.apache.hudi.org.apache.jetty.io.AbstractConnection.close(AbstractConnection.java:248)
at 
org.apache.hudi.org.apache.jetty.io.ManagedSelector.closeNoExceptions(ManagedSelector.java:252)
at 
org.apache.hudi.org.apache.jetty.io.ManagedSelector.access$1400(ManagedSelector.java:61)
at 
org.apache.hudi.org.apache.jetty.io.ManagedSelector$CloseConnections.update(ManagedSelector.java:868)
at 
org.apache.hudi.org.apache.jetty.io.ManagedSelector$SelectorProducer.processUpdates(ManagedSelector.java:428)
at 
org.apache.hudi.org.apache.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:399)
at 
org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
at 
org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
at 
org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.tryProduce

[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130354681

   
   ## CI report:
   
   * 51c78ac36787499f78b683a557f05115d5a77c66 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8750)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130285737

   
   ## CI report:
   
   * 7af9bab093e7dffc60aece12ea9d0ed819ad90e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8751)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130285688

   
   ## CI report:
   
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130278520

   
   ## CI report:
   
   * 0b30db620e174195a03d499a36aa7a22b2c77c54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8749)
 
   * 7af9bab093e7dffc60aece12ea9d0ed819ad90e5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8751)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130227616

   
   ## CI report:
   
   * 0b30db620e174195a03d499a36aa7a22b2c77c54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8749)
 
   * 7af9bab093e7dffc60aece12ea9d0ed819ad90e5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130224046

   
   ## CI report:
   
   * 0b30db620e174195a03d499a36aa7a22b2c77c54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130222961

   
   ## CI report:
   
   * 2dfad790b5c24bc43485e8a80a32a961a65b126a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8748)
 
   * 51c78ac36787499f78b683a557f05115d5a77c66 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8750)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #5074: [SUPPORT] Flink use different record_key format from spark

2022-05-18 Thread GitBox


fengjian428 commented on issue #5074:
URL: https://github.com/apache/hudi/issues/5074#issuecomment-1130213190

   > 
   
   
![image](https://user-images.githubusercontent.com/4403474/169089471-d62d64dd-6e4d-41c0-b19e-8793714a799e.png)
   
   actually we use spark-sql create table, and use flink-sql ingested data, 
then delete data use spark-sql
   
   I go through the code , and I found flink will check key length, if equal to 
1,  then use simple key generator, but spark always use complexKeyGenerator 
whether the lenght is equal to 1 or not
   
![image](https://user-images.githubusercontent.com/4403474/169090145-0d5d796c-330b-48ec-a1f7-d1d43a9c2565.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130184417

   
   ## CI report:
   
   * 0b30db620e174195a03d499a36aa7a22b2c77c54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


hudi-bot commented on PR #5627:
URL: https://github.com/apache/hudi/pull/5627#issuecomment-1130180419

   
   ## CI report:
   
   * 0b30db620e174195a03d499a36aa7a22b2c77c54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-3391) presto and hive beeline fails to read MOR table w/ 2 or more array fields

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3391:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> presto and hive beeline fails to read MOR table w/ 2 or more array fields
> -
>
> Key: HUDI-3391
> URL: https://issues.apache.org/jira/browse/HUDI-3391
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: reader-core, trino-presto
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Critical
> Fix For: 0.12.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> We have an issue reported by user 
> [here|[https://github.com/apache/hudi/issues/2657].] Looks like w/ 0.10.0 or 
> later, spark datasource read works, but hive beeline does not work. Even 
> spark.sql (hive table) querying works as well. 
> Another related ticket: 
> [https://github.com/apache/hudi/issues/3834#issuecomment-997307677]
>  
> Steps that I tried:
> [https://gist.github.com/nsivabalan/fdb8794104181f93b9268380c7f7f079]
> From beeline, you will encounter below exception
> {code:java}
> Failed with exception 
> java.io.IOException:org.apache.hudi.org.apache.avro.SchemaParseException: 
> Can't redefine: array {code}
> All linked ticket states that upgrading parquet to 1.11.0 or greater should 
> work. We need to try it out w/ latest master and go from there. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HUDI-3350) Create Engine-specific Implementations of `HoodieRecord`

2022-05-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3350:
-
Labels: pull-request-available  (was: )

> Create Engine-specific Implementations of `HoodieRecord`
> 
>
> Key: HUDI-3350
> URL: https://issues.apache.org/jira/browse/HUDI-3350
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> To achieve goals of RFC-46 `HoodieRecord` will have to hold internal 
> representations of the records based on the Engine Hudi is being used with.
> For that we need to split `HoodieRecord` into "interface" (or base-class) and 
> engine-specific implementations (holding internal engine-specific 
> representation of the payload).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] wulei0302 opened a new pull request, #5627: [HUDI-3350][HUDI-3351] Rebase Record combining semantic into `HoodieRecordCombiningEngine`

2022-05-18 Thread GitBox


wulei0302 opened a new pull request, #5627:
URL: https://github.com/apache/hudi/pull/5627

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Step2 (RFC - 46):HoodieRecordCombiningEngine and its bridge and spark record
   Based on [Step1](https://github.com/apache/hudi/pull/5522)  , Step2 adds the 
HoodieRecordCombiningEngine API and the HoodieSparkRecord
   
   ## Brief change log
   
 - Rebase Record combining semantic into `HoodieRecordCombiningEngine`
 - Create Engine-specific Implementations of `HoodieRecord`
   
   Jira:
   [HUDI-3350](https://issues.apache.org/jira/browse/HUDI-3350)
   [HUDI-3351](https://issues.apache.org/jira/browse/HUDI-3351)
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130163649

   
   ## CI report:
   
   * 2dfad790b5c24bc43485e8a80a32a961a65b126a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8748)
 
   * 51c78ac36787499f78b683a557f05115d5a77c66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #5223: [SUPPORT] - HUDI clustering - read issues

2022-05-18 Thread GitBox


xushiyan commented on issue #5223:
URL: https://github.com/apache/hudi/issues/5223#issuecomment-1130157997

   > @nsivabalan I tried out both 0.8.0 and 0.10.1 versions. My job is not 
returning duplicates and considering only the latest files. I tried on both 
partitioned and non-partitioned tables as well. Can the issue be due to any 
custom code from the AWS side? https://user-images.githubusercontent.com/20996567/165210088-42df9491-d576-4f31-9edc-7c1e20a2ee3f.png";>
   
   @suryaprasanna have you filed aws support case? this should be followed up 
with aws support team then


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-4102) Optimization of bucket index in Flink

2022-05-18 Thread XiaoyuGeng (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaoyuGeng resolved HUDI-4102.
--

> Optimization of bucket index in Flink
> -
>
> Key: HUDI-4102
> URL: https://issues.apache.org/jira/browse/HUDI-4102
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: HunterHunter
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.11.1, 0.12.0
>
> Attachments: write.task=80,bucket=1.png
>
>
> write.task value can only be less than or equal to bucket num value, because 
> even if it is exceeded, the writing efficiency cannot be improved.This is 
> because the bucket index does not consider partition in partitioncustom



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HUDI-4102) Optimization of bucket index in Flink

2022-05-18 Thread XiaoyuGeng (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538908#comment-17538908
 ] 

XiaoyuGeng commented on HUDI-4102:
--

fixed by https://issues.apache.org/jira/browse/HUDI-4101

> Optimization of bucket index in Flink
> -
>
> Key: HUDI-4102
> URL: https://issues.apache.org/jira/browse/HUDI-4102
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: HunterHunter
>Assignee: XiaoyuGeng
>Priority: Major
> Fix For: 0.11.1, 0.12.0
>
> Attachments: write.task=80,bucket=1.png
>
>
> write.task value can only be less than or equal to bucket num value, because 
> even if it is exceeded, the writing efficiency cannot be improved.This is 
> because the bucket index does not consider partition in partitioncustom



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] hudi-bot commented on pull request #5587: [HUDI-3890] Fix rat plugin issue

2022-05-18 Thread GitBox


hudi-bot commented on PR #5587:
URL: https://github.com/apache/hudi/pull/5587#issuecomment-1130156399

   
   ## CI report:
   
   * 6083ca5d84f85006f37ffe03d8a00e7c5113120e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8746)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130155372

   
   ## CI report:
   
   * 33a3d7398a834b435d4922be51d66801aa2b987c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8743)
 
   * 2dfad790b5c24bc43485e8a80a32a961a65b126a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8748)
 
   * 51c78ac36787499f78b683a557f05115d5a77c66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-2786) Failed to connect to namenode in Docker Demo on Apple M1 chip

2022-05-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2786:
-
Fix Version/s: 0.12.0

> Failed to connect to namenode in Docker Demo on Apple M1 chip
> -
>
> Key: HUDI-2786
> URL: https://issues.apache.org/jira/browse/HUDI-2786
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.12.0
>
>
> {code:java}
> > ./setup_demo.sh 
> [+] Running 1/0
>  ⠿ compose  Warning: No resource found to remove                              
>                                                                               
>                                             0.0s
> [+] Running 15/15
>  ⠿ namenode Pulled                                                            
>                                                                               
>                                             1.4s
>  ⠿ kafka Pulled                                                               
>                                                                               
>                                             1.3s
>  ⠿ presto-worker-1 Pulled                                                     
>                                                                               
>                                             1.3s
>  ⠿ historyserver Pulled                                                       
>                                                                               
>                                             1.4s
>  ⠿ adhoc-2 Pulled                                                             
>                                                                               
>                                             1.3s
>  ⠿ adhoc-1 Pulled                                                             
>                                                                               
>                                             1.4s
>  ⠿ graphite Pulled                                                            
>                                                                               
>                                             1.3s
>  ⠿ sparkmaster Pulled                                                         
>                                                                               
>                                             1.3s
>  ⠿ hive-metastore-postgresql Pulled                                           
>                                                                               
>                                             1.3s
>  ⠿ presto-coordinator-1 Pulled                                                
>                                                                               
>                                             1.3s
>  ⠿ spark-worker-1 Pulled                                                      
>                                                                               
>                                             1.4s
>  ⠿ hiveserver Pulled                                                          
>                                                                               
>                                             1.3s
>  ⠿ hivemetastore Pulled                                                       
>                                                                               
>                                             1.4s
>  ⠿ zookeeper Pulled                                                           
>                                                                               
>                                             1.3s
>  ⠿ datanode1 Pulled                                                           
>                                                                               
>                                             1.3s
> [+] Running 16/16
>  ⠿ Network compose_default              Created                               
>                                                                               
>                                             0.0s
>  ⠿ Container hive-metastore-postgresql  Started                               
>                                                                               
>                                             1.1s
>  ⠿ Container kafkabroker                Started                               
>                                                                               
>                                             1.1s
>  ⠿ Container zookeeper                  Started                               
>                                                                               
>                                             1.1s
>  ⠿ Container namenode                

[GitHub] [hudi] xushiyan commented on issue #5280: [SUPPORT] Docker Demo: Failed to Connect to namenode

2022-05-18 Thread GitBox


xushiyan commented on issue #5280:
URL: https://github.com/apache/hudi/issues/5280#issuecomment-1130136341

   @yihua we should aim to fix this with arm64 docker images. I'll tag the 
ticket for 0.12


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo commented on issue #5589: [SUPPORT] Optimization of bucket index in Flink

2022-05-18 Thread GitBox


minihippo commented on issue #5589:
URL: https://github.com/apache/hudi/issues/5589#issuecomment-1130127980

   @LinMingQiang already fix https://issues.apache.org/jira/browse/HUDI-4101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-3893) Add support to refresh hoodie.properties at regular intervals

2022-05-18 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538887#comment-17538887
 ] 

Raymond Xu commented on HUDI-3893:
--

[~shivnarayan] we discussed this that we don't want people set ttl on hudi 
table as it simply corrupts the data. can we close this?

> Add support to refresh hoodie.properties at regular intervals
> -
>
> Key: HUDI-3893
> URL: https://issues.apache.org/jira/browse/HUDI-3893
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: sivabalan narayanan
>Priority: Critical
> Fix For: 0.12.0
>
>
> in cloud stores, users could set up lifecycle policy to delete files which 
> are not touched for say 30 days. So, wrt "hoodie.properties" which is created 
> once and never updated for the most part, it could get caught with the 
> lifecycle policy. We can ask users not to set the lifecycle policy, but would 
> be good to add support to hoodie to make it resilient. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] minihippo commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


minihippo commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130122623

   The CI environment lacks thrift  under `/usr/local/bin/thrift`, so that 
hudi-metastore can't be compiled. I push the compiled thrift classes as a 
temporary way to pass CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo commented on issue #5589: [SUPPORT] Optimization of bucket index in Flink

2022-05-18 Thread GitBox


minihippo commented on issue #5589:
URL: https://github.com/apache/hudi/issues/5589#issuecomment-1130112534

   get this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bkosuru commented on issue #5569: [SUPPORT] Issues with URL_ENCODE_PARTITIONING_OPT_KEY in hudi 0.11.0

2022-05-18 Thread GitBox


bkosuru commented on issue #5569:
URL: https://github.com/apache/hudi/issues/5569#issuecomment-1130108911

   ```
   Another issue: In the spark reader, we have to change 
   val urlEncodedGraph = 
URLEncoder.encode(s"", 
StandardCharsets.UTF_8.toString)
   to
   val urlEncodedGraph = 
PartitionPathEncodeUtils.escapePathName("")
   
   to make the incremental query work
   
   val tmp = spark.read.format("hudi")
   .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY, 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
   .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, 
"20220511133204671")
   .option("hoodie.file.index.enable", false)
.option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, 
s"/g=$urlEncodedGraph/p=*")
.load("/testing/hudi_11/spog")
   ```

   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130108649

   
   ## CI report:
   
   * 33a3d7398a834b435d4922be51d66801aa2b987c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8743)
 
   * 2dfad790b5c24bc43485e8a80a32a961a65b126a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8748)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130104309

   
   ## CI report:
   
   * 33a3d7398a834b435d4922be51d66801aa2b987c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8743)
 
   * 2dfad790b5c24bc43485e8a80a32a961a65b126a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] JosefinaArayaTapia commented on issue #5389: [SUPPORT] - AWS EMR and Glue Catalog

2022-05-18 Thread GitBox


JosefinaArayaTapia commented on issue #5389:
URL: https://github.com/apache/hudi/issues/5389#issuecomment-1130096584

   Hi @xushiyan 
   
   I have presented the case to aws support and they sent me the following 
configuration which solved my problem.
   Also now use EMR 6.4.0
   
   
   ```
   #NewOptions - Change here is used ComplexKeyGenerator instead of 
SImpleKeyGenerator, and used more than one column in recordkeyfield
   
   hudiOptions = {
   'hoodie.datasource.write.precombine.field':'last_update_time',
   'hoodie.datasource.write.recordkey.field': 'id,creation_date', 
   'hoodie.table.name': 'newhuditest0439', 
   'hoodie.datasource.hive_sync.mode':'hms', 
   'hoodie.datasource.write.hive_style_partitioning':'true', 
   'hoodie.compact.inline.max.delta.commits':1, 
   'hoodie.compact.inline.trigger.strategy':'NUM_COMMITS', 
   'hoodie.datasource.compaction.async.enable':'false', 
   'hoodie.datasource.write.table.type':'COPY_ON_WRITE', 
   'hoodie.index.type':'GLOBAL_BLOOM', 
   'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
   
'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.ComplexKeyGenerator',
 
   'hoodie.bloom.index.filter.type':'DYNAMIC_V0', 
   'hoodie.bloom.index.update.partition.path': 'false', 
   'hoodie.datasource.hive_sync.table':'newhuditest0439', 
   'hoodie.datasource.hive_sync.enable':'true', 
   'hoodie.datasource.write.partitionpath.field':'creation_date', 
   'hoodie.datasource.hive_sync.partition_fields':'creation_date', 
   'hoodie.datasource.hive_sync.database':'default', 
   'hoodie.datasource.hive_sync.support_timestamp': 'true'
   } 
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #5388: Error in query: Cannot partition by nested column: meta.source / java.lang.IllegalArgumentException: Can't find preCombineKey `meta.lastUpdated` in r

2022-05-18 Thread GitBox


xushiyan commented on issue #5388:
URL: https://github.com/apache/hudi/issues/5388#issuecomment-1130092897

   @santoshsb this is caused by some discrepancies btw spark sql and spark data 
source option. we should fix this in 0.11.1. Spark data source allows using 
nested field for partition and precombine field. but spark sql does some 
validation which restricts it. The validation logic is here (to be relaxed)
   https://github.com/apache/hudi/pull/5517
   
   cc @yihua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130087894

   
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732)
 
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4111) Bump ANTLR runtime version in Spark 3.x

2022-05-18 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-4111:
-
Summary: Bump ANTLR runtime version in Spark 3.x  (was: Bump ANTLR runtime 
version to 4.8 in Spark 3.2)

> Bump ANTLR runtime version in Spark 3.x
> ---
>
> Key: HUDI-4111
> URL: https://issues.apache.org/jira/browse/HUDI-4111
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: dzcxzl
>Priority: Trivial
>  Labels: pull-request-available
>
> Spark3.2 uses antlr version 4.8, Hudi uses 4.7, use the same version to avoid 
> a log of antlr check versions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [hudi] xushiyan commented on issue #5389: [SUPPORT] - AWS EMR and Glue Catalog

2022-05-18 Thread GitBox


xushiyan commented on issue #5389:
URL: https://github.com/apache/hudi/issues/5389#issuecomment-1130084315

   @JosefinaArayaTapia have you filed aws support case? this is for aws hudi 
and aws-specific environment, should be troubleshooted with aws support team


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130083583

   
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732)
 
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5618: [HUDI-3555] Re-use spark config for parquet timestamp format

2022-05-18 Thread GitBox


hudi-bot commented on PR #5618:
URL: https://github.com/apache/hudi/pull/5618#issuecomment-1130079151

   
   ## CI report:
   
   * 5e884a7cf8a35cf80af7b58b4c0f17bfb7b8b523 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8744)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5587: [HUDI-3890] Fix rat plugin issue

2022-05-18 Thread GitBox


hudi-bot commented on PR #5587:
URL: https://github.com/apache/hudi/pull/5587#issuecomment-1130079026

   
   ## CI report:
   
   * c79543ec973cd476f83e46e277211f9ce667006f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8738)
 
   * 6083ca5d84f85006f37ffe03d8a00e7c5113120e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8746)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #5484: [SUPPORT] Hive Sync + AWS Data Catalog failling with Hudi 0.11.0

2022-05-18 Thread GitBox


xushiyan commented on issue #5484:
URL: https://github.com/apache/hudi/issues/5484#issuecomment-1130052056

   @jasondavindev have you filed aws support case? this is specific to aws 
environment so it should be investigated from aws side. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5587: [HUDI-3890] Fix rat plugin issue

2022-05-18 Thread GitBox


hudi-bot commented on PR #5587:
URL: https://github.com/apache/hudi/pull/5587#issuecomment-1130034153

   
   ## CI report:
   
   * c79543ec973cd476f83e46e277211f9ce667006f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8738)
 
   * 6083ca5d84f85006f37ffe03d8a00e7c5113120e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] huberylee commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


huberylee commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875907370


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##
@@ -325,7 +335,7 @@ class TestClusteringProcedure extends 
HoodieSparkSqlTestBase {
   assertResult(3)(clusteringPlan.get().getInputGroups.size())
 
   // No pending clustering instant
-  checkAnswer(s"call show_clustering(table => '$tableName')")()
+  spark.sql(s"call show_clustering(table => '$tableName')").show()

Review Comment:
   I will change this test case to check more return result info



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #5493: [SUPPORT] Hudi Batch job failures for different tables randomly every run

2022-05-18 Thread GitBox


xushiyan commented on issue #5493:
URL: https://github.com/apache/hudi/issues/5493#issuecomment-1130014755

   > looks like you are using 0.5.0 version of hudi. I would highly recommend 
migrating to a latest version. Might be tough for us to put in a fix even if we 
root cause it(we don't backport issues in general).
   
   +1. In addition, 0.5.0 version is not certified to work with 
   
   > Hive version : 3.1.2
   Hadoop version : 3.2.1
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Initialize hudi metastore module.

2022-05-18 Thread GitBox


hudi-bot commented on PR #5064:
URL: https://github.com/apache/hudi/pull/5064#issuecomment-1130008250

   
   ## CI report:
   
   * 33a3d7398a834b435d4922be51d66801aa2b987c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8743)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] raviMoengage commented on issue #5565: [SUPPORT] Async compaction is not triggered on the MOR Hudi table using spark streaming

2022-05-18 Thread GitBox


raviMoengage commented on issue #5565:
URL: https://github.com/apache/hudi/issues/5565#issuecomment-1130005623

   Hi @nsivabalan, After checking again. I found out there was 2 
`compaction.inflight` timeline instant.
   
   So I changed the base path and tried the following config and it's working 
fine.
   ```
   option("hoodie.compact.schedule.inline","true").
   option("hoodie.compact.inline.max.delta.commits","1").
   ```
   For the 2 inflight compactions, I had to run offline compaction in execute 
mode with instant-time.
   
   Thanks for the support. 
   
   Closing the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] raviMoengage closed issue #5565: [SUPPORT] Async compaction is not triggered on the MOR Hudi table using spark streaming

2022-05-18 Thread GitBox


raviMoengage closed issue #5565: [SUPPORT] Async compaction is not triggered on 
the MOR Hudi table using spark streaming
URL: https://github.com/apache/hudi/issues/5565


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] aiwenmo closed issue #5513: [SUPPORT] Sync realtime whole mysql database to hudi failed when using flink datastream api

2022-05-18 Thread GitBox


aiwenmo closed issue #5513: [SUPPORT] Sync realtime whole mysql database to 
hudi failed when using flink datastream api
URL: https://github.com/apache/hudi/issues/5513


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #5513: [SUPPORT] Sync realtime whole mysql database to hudi failed when using flink datastream api

2022-05-18 Thread GitBox


xushiyan commented on issue #5513:
URL: https://github.com/apache/hudi/issues/5513#issuecomment-1129966941

   > > You need to set up the key generator clazz correctly.
   > 
   > thx. Your method is also OK.
   
   @aiwenmo did the suggestion solve the issue? if so let's close this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5625: [HUDI-4118] fix sync hive Hive table already exists throw an exception

2022-05-18 Thread GitBox


hudi-bot commented on PR #5625:
URL: https://github.com/apache/hudi/pull/5625#issuecomment-1129958250

   
   ## CI report:
   
   * 673d5d86837dd907d872f9134ffdcfd4eeb7b156 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8739)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842983


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCompactionProcedure.scala:
##
@@ -48,22 +48,48 @@ class TestCompactionProcedure extends 
HoodieSparkSqlTestBase {
   spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)")
   spark.sql(s"update $tableName set price = 11 where id = 1")
 
-  spark.sql(s"call run_compaction(op => 'schedule', table => 
'$tableName')")
+  // Schedule the first compaction
+  val firstResult = spark.sql(s"call run_compaction(op => 'schedule', 
table => '$tableName')")
+.collect()
+.map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
   spark.sql(s"update $tableName set price = 12 where id = 2")
-  spark.sql(s"call run_compaction('schedule', '$tableName')")
-  val compactionRows = spark.sql(s"call show_compaction(table => 
'$tableName', limit => 10)").collect()
-  val timestamps = compactionRows.map(_.getString(0))
+
+  // Schedule the second compaction
+  val secondResult = spark.sql(s"call run_compaction('schedule', 
'$tableName')")
+.collect()
+.map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
+  assertResult(1)(firstResult.length)
+  assertResult(1)(secondResult.length)
+  val showCompactionSql: String = s"call show_compaction(table => 
'$tableName', limit => 10)"
+  checkAnswer(showCompactionSql)(
+firstResult(0),
+secondResult(0)
+  )
+
+  val compactionRows = spark.sql(showCompactionSql).collect()
+  val timestamps = compactionRows.map(_.getString(0)).sorted
   assertResult(2)(timestamps.length)
 
-  spark.sql(s"call run_compaction(op => 'run', table => '$tableName', 
timestamp => ${timestamps(1)})")
+  // Execute the second scheduled compaction instant actually
+  spark.sql(s"call run_compaction(op => 'run', table => '$tableName', 
timestamp => ${timestamps(1)})").show()

Review Comment:
   Ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842634


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCompactionProcedure.scala:
##
@@ -98,25 +124,37 @@ class TestCompactionProcedure extends 
HoodieSparkSqlTestBase {
   spark.sql(s"insert into $tableName values(3, 'a3', 10, 1000)")
   spark.sql(s"update $tableName set price = 11 where id = 1")
 
-  spark.sql(s"call run_compaction(op => 'run', path => 
'${tmp.getCanonicalPath}')")
+  spark.sql(s"call run_compaction(op => 'run', path => 
'${tmp.getCanonicalPath}')").show()
   checkAnswer(s"select id, name, price, ts from $tableName order by id")(
 Seq(1, "a1", 11.0, 1000),
 Seq(2, "a2", 10.0, 1000),
 Seq(3, "a3", 10.0, 1000)
   )
   assertResult(0)(spark.sql(s"call show_compaction(path => 
'${tmp.getCanonicalPath}')").collect().length)
-  // schedule compaction first
+
   spark.sql(s"update $tableName set price = 12 where id = 1")
-  spark.sql(s"call run_compaction(op=> 'schedule', path => 
'${tmp.getCanonicalPath}')")
 
-  // schedule compaction second
+  // Schedule the first compaction
+  val firstResult = spark.sql(s"call run_compaction(op=> 'schedule', path 
=> '${tmp.getCanonicalPath}')")
+.collect()
+.map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
   spark.sql(s"update $tableName set price = 12 where id = 2")
-  spark.sql(s"call run_compaction(op => 'schedule', path => 
'${tmp.getCanonicalPath}')")
 
-  // show compaction
-  assertResult(2)(spark.sql(s"call show_compaction(path => 
'${tmp.getCanonicalPath}')").collect().length)
-  // run compaction for all the scheduled compaction
-  spark.sql(s"call run_compaction(op => 'run', path => 
'${tmp.getCanonicalPath}')")
+  // Schedule the second compaction
+  val secondResult = spark.sql(s"call run_compaction(op => 'schedule', 
path => '${tmp.getCanonicalPath}')")
+.collect()
+.map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
+  assertResult(1)(firstResult.length)
+  assertResult(1)(secondResult.length)
+  checkAnswer(s"call show_compaction(path => '${tmp.getCanonicalPath}')")(
+firstResult(0),
+secondResult(0)
+  )
+
+  // Run compaction for all the scheduled compaction
+  spark.sql(s"call run_compaction(op => 'run', path => 
'${tmp.getCanonicalPath}')").show()

Review Comment:
   Ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

2022-05-18 Thread GitBox


XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842085


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##
@@ -325,7 +335,7 @@ class TestClusteringProcedure extends 
HoodieSparkSqlTestBase {
   assertResult(3)(clusteringPlan.get().getInputGroups.size())
 
   // No pending clustering instant
-  checkAnswer(s"call show_clustering(table => '$tableName')")()
+  spark.sql(s"call show_clustering(table => '$tableName')").show()

Review Comment:
   why change `checkAnswer` to `spark.sql`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >