[GitHub] [hudi] yihua merged pull request #4268: [HUDI-2970] Adding tests for archival of replace commit actions

2021-12-18 Thread GitBox


yihua merged pull request #4268:
URL: https://github.com/apache/hudi/pull/4268


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-997345262


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4472)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4548)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-997153336


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * 9b9620a298b45a57af6e596c9305a49ccc69345a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4432)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4427)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4457)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4472)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4346:
URL: https://github.com/apache/hudi/pull/4346#issuecomment-997153375


   
   ## CI report:
   
   * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f
 0d7039a0cc/_build/results?buildId=4473) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344999


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4546)
 
   * d1410a1f18f89bfaacd2ba5fff3ca564d45e1699 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833] Clean up unused archive files instead of expanding indefinitely.

2021-12-18 Thread GitBox


zhangyue19921010 commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-997345000


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4346:
URL: https://github.com/apache/hudi/pull/4346#issuecomment-997344984


   
   ## CI report:
   
   * 2227d98a76c74d94538a57467fe4d72f0a0daeae Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4399)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4406)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4408)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4425)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4430)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4435)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4458)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f
 0d7039a0cc/_build/results?buildId=4473) Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4547)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344657


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4546)
 
   * d1410a1f18f89bfaacd2ba5fff3ca564d45e1699 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan

2021-12-18 Thread GitBox


zhangyue19921010 commented on pull request #4346:
URL: https://github.com/apache/hudi/pull/4346#issuecomment-997344925


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #4346: [HUDI-3045] New ClusteringPlanStrategy to use regex choose partitions when building clustering plan

2021-12-18 Thread GitBox


zhangyue19921010 removed a comment on pull request #4346:
URL: https://github.com/apache/hudi/pull/4346#issuecomment-997082806


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344657


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4546)
 
   * d1410a1f18f89bfaacd2ba5fff3ca564d45e1699 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344387


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4546)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344387


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4546)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997320893


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


nsivabalan commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997344375


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-997338309


   
   ## CI report:
   
   * 38fc71de42b6d4a73de6c5acef52b55d6a278f7d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4294)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4304)
 
   * 5ed6cf0d0f66876c76bcae3fadeaf1f366413cd4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-997343538


   
   ## CI report:
   
   * 5ed6cf0d0f66876c76bcae3fadeaf1f366413cd4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 commented on issue #3890: [SUPPORT] Hudi Sync did not add previous partitions

2021-12-18 Thread GitBox


stym06 commented on issue #3890:
URL: https://github.com/apache/hudi/issues/3890#issuecomment-997343328


   @nsivabalan I don't have the hoodie folder right now, but what I saw was 
that when hive sync ran with last_commit_time as 26, it searches all the commit 
files after 26 and gets the partitions that were written in those particular 
commits. However, the commits from 27 and 28 were not there in the hoodie 
folder but 29th day commit file was there. And. in the code, it is written to 
get all commits after the sync time and find partitions to add. 
   
   As a workaround, I had to add some code change to list wasb folder structure 
and add the missing partitions that seems to work. Commits from 28 and 29 were 
archived most probably.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 edited a comment on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


stym06 edited a comment on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997342110


   @nsivabalan I have pasted the contents of the source.properties file above. 
btw, any idea how I can start spark-shell with the azure related dependencies? 
I'm getting the below error:
   ```
   spark-shell \
 --jars hadoop-azure-3.2.0.jar, azure-storage-7.0.0.jar \
 --packages 
org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0,org.apache.spark:spark-avro_2.12:3.1.2
 \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   
   org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:104)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
 at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
 ... 59 elided
   Caused by: java.io.IOException: No FileSystem for scheme: wasb
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 edited a comment on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


stym06 edited a comment on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997342110


   @nsivabalan I have pasted the contents of the source.properties file above. 
btw, any idea how I can start spark-shell with the azure related dependencies? 
I'm getting the below error:
   ```
   org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:104)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
 at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
 ... 59 elided
   Caused by: java.io.IOException: No FileSystem for scheme: wasb
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


stym06 commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997342110


   @nsivabalan any idea how I can start spark-shell with the azure related 
dependencies? I'm getting the below error:
   ```
   org.apache.hudi.exception.HoodieIOException: Failed to get instance of 
org.apache.hadoop.fs.FileSystem
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:104)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
 at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
 ... 59 elided
   Caused by: java.io.IOException: No FileSystem for scheme: wasb
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
 at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-12-18 Thread GitBox


nsivabalan commented on issue #4027:
URL: https://github.com/apache/hudi/issues/4027#issuecomment-997341873


   Hey @liujinhui1994 : Can you try with 0.10.0 or latest master. Looks like we 
made a fix around 0 outputfileGroups by nov 12 in 
[this](https://github.com/apache/hudi/pull/3833/files#r738037336) patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3572: compatble version of hudi, hive and hadoop

2021-12-18 Thread GitBox


nsivabalan commented on issue #3572:
URL: https://github.com/apache/hudi/issues/3572#issuecomment-997341300


   also, can you give us full stacktrace and configs you are running with. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3572: compatble version of hudi, hive and hadoop

2021-12-18 Thread GitBox


nsivabalan commented on issue #3572:
URL: https://github.com/apache/hudi/issues/3572#issuecomment-997341166


   @niloo-sh : I see you have still using hudi 0.6.0. Can you try one of the 
latest version like 0.9.0 or higher. we don't think hive 3 has been tested with 
older versions of hudi. 
   Also, would be easier to run hive sync tool from terminal rather than IDE. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


nsivabalan commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997340020


   thanks. can you post the contents of 
/opt/spark/hudi/config/source.properties as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-12-18 Thread GitBox


nsivabalan edited a comment on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-997339302


   For `NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.`, please 
refer to https://github.com/apache/hudi/issues/2498#issuecomment-969228521 for 
the proposed fix. You have to provide the jars explicitly in class path or add 
it to spark/jars dir.
   
   We had issues w/ EMR version of spark and after copying spark-sql jar to 
spark/jars directory, able to resolve it. we don't face this issue when we use 
open source spark. Its an issue only with emr spark. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-12-18 Thread GitBox


nsivabalan commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-997339302


   For `NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.`, please 
refer to https://github.com/apache/hudi/issues/2498#issuecomment-969228521 for 
the proposed fix. You have to provide the jars explicitly in class path or add 
it to spark/jars dir.
   
   We had issues w/ EMR version of spark and after copying spark-sql jar to 
spark/jars directory, able to resolve it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997330213


   
   ## CI report:
   
   * 00a7ae875deb424f4e6dfa7db7fd65821d8b59fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4534)
 
   * a2eeed545f610a73f26079a8b505ff5742296002 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4544)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997338331


   
   ## CI report:
   
   * a2eeed545f610a73f26079a8b505ff5742296002 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4544)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-997338026


   
   ## CI report:
   
   * 38fc71de42b6d4a73de6c5acef52b55d6a278f7d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4294)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4304)
 
   * 5ed6cf0d0f66876c76bcae3fadeaf1f366413cd4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-997338309


   
   ## CI report:
   
   * 38fc71de42b6d4a73de6c5acef52b55d6a278f7d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4294)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4304)
 
   * 5ed6cf0d0f66876c76bcae3fadeaf1f366413cd4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-997338026


   
   ## CI report:
   
   * 38fc71de42b6d4a73de6c5acef52b55d6a278f7d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4294)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4304)
 
   * 5ed6cf0d0f66876c76bcae3fadeaf1f366413cd4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4270: [HUDI-2811] Support Spark 3.2

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4270:
URL: https://github.com/apache/hudi/pull/4270#issuecomment-994406673


   
   ## CI report:
   
   * 38fc71de42b6d4a73de6c5acef52b55d6a278f7d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4294)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4304)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Gatsby-Lee commented on issue #3975: [SUPPORT] Question on hudi's delete statment taking too long

2021-12-18 Thread GitBox


Gatsby-Lee commented on issue #3975:
URL: https://github.com/apache/hudi/issues/3975#issuecomment-997336494


   @nsivabalan  isn't the BLOOM index the default one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3066) Very slow file listing after enabling metadata for existing tables in 0.10.0 release

2021-12-18 Thread Harsha Teja Kanna (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsha Teja Kanna updated HUDI-3066:

Priority: Minor  (was: Critical)

> Very slow file listing after enabling metadata for existing tables in 0.10.0 
> release
> 
>
> Key: HUDI-3066
> URL: https://issues.apache.org/jira/browse/HUDI-3066
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: EMR 6.4.0
> Hudi version : 0.10.0
>Reporter: Harsha Teja Kanna
>Priority: Minor
>  Labels: performance, pull-request-available
> Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png
>
>
> After 'metadata table' is enabled, File listing takes long time.
> If metadata is enabled on Reader side, it is taking even more time per file 
> listing task.
> Existing tables (COW) have inline clustering on and have many replace commits.
> Logs seem to suggest the delay is in view.AbstractTableFileSystemView 
> resetFileGroupsReplaced function or metadata.HoodieBackedTableMetadata
> Also many log messages in AbstractHoodieLogRecordReader
>  
> 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms 
> to read  136 instants, 9731 replaced file groups
> 2021-12-18 23:37:46,086 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.76_0-20-515
>  at instant 20211217035105329
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,094 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.62_0-34-377
>  at instant 20211217022049877
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,105 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,110 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.77_0-35-590',
>  fileLen=0}
> 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613
>  at instant 20211216183448389
> 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,118 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.63_0-56-519',
>  fileLen=0}
> 2021-12-18 23:37:46,122 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362',
>  fileLen=0}
> 2021-12-18 23:37:46,122 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,123 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663
>  at instant 20211217090337935
> 2021-12-18 23:37:46,123 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,127 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://

[GitHub] [hudi] stym06 commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


stym06 commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997334777


   @nsivabalan I tried querying through both Presto and Hive and got duplicate 
records. Yet to query through Spark datasource. Will post the `.hoodie` folder 
in some time. Posting the spark-submit below:
   
   ```
   #
   # Copyright 2018 Google LLC
   #
   # Licensed under the Apache License, Version 2.0 (the "License");
   # you may not use this file except in compliance with the License.
   # You may obtain a copy of the License at
   #
   # https://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing, software
   # distributed under the License is distributed on an "AS IS" BASIS,
   # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   # See the License for the specific language governing permissions and
   # limitations under the License.
   # 
   
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
 name: hudi-lpe-ds-{{ ti.job_id }} 
 namespace: dataplatform
 annotations:
   spark.platform/type: streaming
 labels:
   spark_name: hudi-lpe-ds-{{ ti.job_id }}
   dag_name: hudi-lpe
   task_name: ds
   environment: "prod"
   cloud: "azure"
   tier: "t1"
   team: "dataplatform"
   service_type: "airflow"
   k8s_cluster_name: "kai"
   
   spec:
 type: Java
 mode: cluster
 image: "hudi-ds-azure-0.2"
 imagePullPolicy: Always
 mainClass: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
 mainApplicationFile: 
"local:///opt/spark/hudi/hudi-utilities-bundle_2.11-0.9.0-SNAPSHOT.jar"
 sparkConf:
   "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
 arguments:
   - "--table-type"
   - "COPY_ON_WRITE"
   - "--props"
   - "/opt/spark/hudi/config/source.properties"
   - "--schemaprovider-class"
   - "org.apache.hudi.utilities.schema.SchemaRegistryProvider"
   - "--source-class"
   - "org.apache.hudi.utilities.sources.JsonKafkaSource"
   - "--target-base-path"
   - 
"wasb://container...@account.blob.core.windows.net/data/pipelines/hudi/kafka/telemetrics_v2/dp.hmi.quectel.event.lpe.packet.v2"
   - "--target-table"
   - "dp_hmi_quectel_event_lpe_packet_v2"
   - "--op"
   - "INSERT"
   - "--source-ordering-field"
   - "timestamp"
   - "--continuous"
   - "--min-sync-interval-seconds"
   - "60"
 sparkVersion: "2.4.4"
 restartPolicy:
   type: Always
   onFailureRetries: 10
   onFailureRetryInterval: 60
   onSubmissionFailureRetries: 10
   onSubmissionFailureRetryInterval: 60
 timeToLiveSeconds: 3600
 volumes:
   - name: hudi-lpe-ds
 configMap:
   name: hudi-lpe-ds
 driver:
   env:
 - name: HOODIE_ENV_fs_DOT_azure_DOT_wasb_DOT_account_DOT_name
   value: {{ 
var.value.HOODIE_ENV_fs_DOT_azure_DOT_wasb_DOT_account_DOT_name }}
 - name: HOODIE_ENV_fs_DOT_azure_DOT_account_DOT_key_DOT_{{ 
var.value.DP_DPV3_BLOB_STORAGE }}_DOT_blob_DOT_core_DOT_windows_DOT_net
   value: {{ 
var.value.HOODIE_ENV_fs_DOT_azure_DOT_account_DOT_key_DOT_account_DOT_blob_DOT_core_DOT_windows_DOT_net
 }}
   cores: 1
   coreLimit: "1200m"
   memory: "4G"
   serviceAccount: "dataplatform"
   volumeMounts:
 - name: hudi-lpe-ds
   mountPath: /opt/spark/hudi/config
   subpath: config.yaml
   memoryOverhead: "1024"
   javaOptions: "-Dnetworkaddress.cache.ttl=60 -Duser.timezone=IST 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime 
-XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/tmp/varadarb_ds_driver.hprof"
   # affinity:
   #  nodeAffinity:
   #requiredDuringSchedulingIgnoredDuringExecution:
   #  nodeSelectorTerms:
   #  - matchExpressions:
   #- key: service
   #  operator: In
   #  values:
   #  - airflow-spark
   #- key: "node-lifecycle"
   #  operator: In
   #  values:
   #  - "ondemand"
 executor:
   env:
 - name: HOODIE_ENV_fs_DOT_azure_DOT_wasb_DOT_account_DOT_name
   value: {{ 
var.value.HOODIE_ENV_fs_DOT_azure_DOT_wasb_DOT_account_DOT_name }}
 - name: HOODIE_ENV_fs_DOT_azure_DOT_account_DOT_key_DOT_{{ 
var.value.DP_DPV3_BLOB_STORAGE }}_DOT_blob_DOT_core_DOT_windows_DOT_net
   value: {{ 
var.value.HOODIE_ENV_fs_DOT_azure_DOT_account_DOT_key_DOT_account_DOT_blob_DOT_core_DOT_windows_DOT_net
 }}
   cores: 1
   instances: 3
   memory: "6G"
   volumeMounts:
 - name: hudi-lpe-ds
   mountPath: /opt/spark/hudi/config
   subpath: config.yaml
   memoryOverhead: "3072"
   javaOptions: "-Dnetworkaddress.cache.ttl=60 -Duser.timezone=IST 

[GitHub] [hudi] hudi-bot removed a comment on pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4383:
URL: https://github.com/apache/hudi/pull/4383#issuecomment-997329969


   
   ## CI report:
   
   * db3b8ce5a4d589fadf721e5427a0c185de61 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4543)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4383:
URL: https://github.com/apache/hudi/pull/4383#issuecomment-997334430


   
   ## CI report:
   
   * db3b8ce5a4d589fadf721e5427a0c185de61 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4543)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6

2021-12-18 Thread GitBox


nsivabalan commented on issue #4072:
URL: https://github.com/apache/hudi/issues/4072#issuecomment-997332848


   awesome, thanks for letting us know. we didn't know you can prefix 
spark.hadoop and provide as spark configs. I will go ahead and close the github 
issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #4072: [SUPPORT]Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/scala/table6

2021-12-18 Thread GitBox


nsivabalan closed issue #4072:
URL: https://github.com/apache/hudi/issues/4072


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


nsivabalan edited a comment on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997089823


   @stym06 : Did you try querying via spark datasource? Do you see the same. 
would help if we can rule out if there is any issue w/ underlying storage or 
some query eng specifics in play.
   Also, can you post the contents of .hoodie. Wanna check if the two commits 
times are two writes or one is from compaction. 
   @bvaradar @bhasudha @satishkotha : have you folks encountered duplicates 
records anytime. Any idea on why this could happen. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


nsivabalan edited a comment on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997331339


   also, can you give us spark-submit command (mask any info as per necessity) 
you used while triggering the detlastreamer. 
   I assume there is no clustering. can you confirm please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2021-12-18 Thread GitBox


nsivabalan commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-997331339


   also, can you give us spark-submit command (mask any info as per necessity). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997330213


   
   ## CI report:
   
   * 00a7ae875deb424f4e6dfa7db7fd65821d8b59fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4534)
 
   * a2eeed545f610a73f26079a8b505ff5742296002 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4544)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997329954


   
   ## CI report:
   
   * 00a7ae875deb424f4e6dfa7db7fd65821d8b59fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4534)
 
   * a2eeed545f610a73f26079a8b505ff5742296002 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-3065) spark auto partition discovery does not work from 0.9.0

2021-12-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3065:


Assignee: Yann Byron  (was: Raymond Xu)

> spark auto partition discovery does not work from 0.9.0
> ---
>
> Key: HUDI-3065
> URL: https://issues.apache.org/jira/browse/HUDI-3065
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Yann Byron
>Priority: Major
>  Labels: core-flow-ds, sev:critical, spark
>
> with 0.8.0, if partition is of the format  "/partitionKey=partitionValue", 
> Spark auto partition discovery will kick in. we can see explicit fields in 
> hudi's table schema. 
> But with 0.9.0, it does not happen. 
> // launch spark shell with 0.8.0 
> {code:java}
> import org.apache.hudi.QuickstartUtils._import 
> scala.collection.JavaConversions._import 
> org.apache.spark.sql.SaveMode._import 
> org.apache.hudi.DataSourceReadOptions._import 
> org.apache.hudi.DataSourceWriteOptions._import 
> org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"val basePath = 
> "file:///tmp/hudi_trips_cow"val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  option(RECORDKEY_FIELD_OPT_KEY, 
> "uuid").  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>         read.
>         format("hudi").
>         load(basePath)
> tripsSnapshotDF.printSchema {code}
> //output : check for continent, country, city in the end. 
> |– _hoodie_commit_time: string (nullable = true)|
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- continent: string (nullable = true)
>  |-- country: string (nullable = true)
>  |-- city: string (nullable = true)
>  
>  
> Lets run this with 0.9.0.
> {code:java}
> import org.apache.hudi.QuickstartUtils._import 
> scala.collection.JavaConversions._import 
> org.apache.spark.sql.SaveMode._import 
> org.apache.hudi.DataSourceReadOptions._import 
> org.apache.hudi.DataSourceWriteOptions._import 
> org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "hudi_trips_cow"val basePath = 
> "file:///tmp/hudi_trips_cow"val dataGen = new DataGenerator
> val inserts = convertToStringList(dataGen.generateInserts(10))
> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
> val newDf = df.withColumn("partitionpath", regexp_replace($"partitionpath", 
> "(.*)(\\/){1}(.*)(\\/){1}", "continent=$1$2country=$3$4city="))
> newDf.write.format("hudi").  options(getQuickstartWriteConfigs).  
> option(PRECOMBINE_FIELD_OPT_KEY, "ts").  option(RECORDKEY_FIELD_OPT_KEY, 
> "uuid").  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").  
> option(TABLE_NAME, tableName).  mode(Overwrite).  save(basePath)
> val tripsSnapshotDF = spark.
>      |   read.
>      |   format("hudi").
>      |   load(basePath )
> tripsSnapshotDF.printSchema {code}
> /output: continent, country, city is missing. 
> root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- begin_lat: double (nullable = true)
>  |-- begin_lon: double (nullable = true)
>  |-- driver: string (nullable = true)
>  |-- end_lat: double (nullable = true)
>  |-- end_lon: double (nullable = true)
>  |-- fare: double (nullable = true)
>  |-- rider: string (nullable = true)
>  |-- ts: long (nullable = true)
>  |-- uuid: string (nullable = true)
>  |-- partitionpath: string (nullable = true)
>  
> Ref issue: [https://github.com/apache/hudi/issues/3984]
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4383:
URL: https://github.com/apache/hudi/pull/4383#issuecomment-997329732


   
   ## CI report:
   
   * db3b8ce5a4d589fadf721e5427a0c185de61 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4383:
URL: https://github.com/apache/hudi/pull/4383#issuecomment-997329969


   
   ## CI report:
   
   * db3b8ce5a4d589fadf721e5427a0c185de61 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4543)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997308459


   
   ## CI report:
   
   * 00a7ae875deb424f4e6dfa7db7fd65821d8b59fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4534)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4345: [HUDI-2970] Add test for archiving partition delete commit

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4345:
URL: https://github.com/apache/hudi/pull/4345#issuecomment-997329954


   
   ## CI report:
   
   * 00a7ae875deb424f4e6dfa7db7fd65821d8b59fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4534)
 
   * a2eeed545f610a73f26079a8b505ff5742296002 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4383:
URL: https://github.com/apache/hudi/pull/4383#issuecomment-997329732


   
   ## CI report:
   
   * db3b8ce5a4d589fadf721e5427a0c185de61 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3066) Very slow file listing after enabling metadata for existing tables in 0.10.0 release

2021-12-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3066:
-
Labels: performance pull-request-available  (was: performance)

> Very slow file listing after enabling metadata for existing tables in 0.10.0 
> release
> 
>
> Key: HUDI-3066
> URL: https://issues.apache.org/jira/browse/HUDI-3066
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: EMR 6.4.0
> Hudi version : 0.10.0
>Reporter: Harsha Teja Kanna
>Priority: Critical
>  Labels: performance, pull-request-available
> Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png
>
>
> After 'metadata table' is enabled, File listing takes long time.
> If metadata is enabled on Reader side, it is taking even more time per file 
> listing task.
> Existing tables (COW) have inline clustering on and have many replace commits.
> Logs seem to suggest the delay is in view.AbstractTableFileSystemView 
> resetFileGroupsReplaced function or metadata.HoodieBackedTableMetadata
> Also many log messages in AbstractHoodieLogRecordReader
>  
> 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms 
> to read  136 instants, 9731 replaced file groups
> 2021-12-18 23:37:46,086 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.76_0-20-515
>  at instant 20211217035105329
> 2021-12-18 23:37:46,090 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,094 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613',
>  fileLen=0}
> 2021-12-18 23:37:46,095 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.62_0-34-377
>  at instant 20211217022049877
> 2021-12-18 23:37:46,096 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,105 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663',
>  fileLen=0}
> 2021-12-18 23:37:46,109 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,110 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.77_0-35-590',
>  fileLen=0}
> 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.20_0-35-613
>  at instant 20211216183448389
> 2021-12-18 23:37:46,112 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,118 INFO log.HoodieLogFormatReader: Moving to the next 
> reader for logfile 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.63_0-56-519',
>  fileLen=0}
> 2021-12-18 23:37:46,122 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> HoodieLogFile\{pathStr='s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.86_0-20-362',
>  fileLen=0}
> 2021-12-18 23:37:46,122 INFO s3a.S3AInputStream: Switching to Random IO seek 
> policy
> 2021-12-18 23:37:46,123 INFO log.AbstractHoodieLogRecordReader: Reading a 
> data block from file 
> s3a://datalake-hudi/sessions/.hoodie/metadata/files/.files-_20211216144130775001.log.121_0-57-663
>  at instant 20211217090337935
> 2021-12-18 23:37:46,123 INFO log.AbstractHoodieLogRecordReader: Number of 
> remaining logblocks to merge 1
> 2021-12-18 23:37:46,127 INFO log.AbstractHoodieLogRecordReader: Scanning log 
> file 
> Hoo

[GitHub] [hudi] h7kanna opened a new pull request #4383: HUDI-3066 - Reduce log level in hot path of scan in log record reader

2021-12-18 Thread GitBox


h7kanna opened a new pull request #4383:
URL: https://github.com/apache/hudi/pull/4383


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fixe HUDI-3066
   
   ## Brief change log
   
   Change log level to debug from info
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-3052) Flaky TestJsonKafkaSource in CI runs

2021-12-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3052.

Resolution: Fixed

> Flaky TestJsonKafkaSource in  CI runs
> -
>
> Key: HUDI-3052
> URL: https://issues.apache.org/jira/browse/HUDI-3052
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available, sev:critical
>
> TestJsonKafkaSource.testJsonKafkaSourceResetStrategy
> Reference: 
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/4441/logs/26]
>  
>  
> {code:java}
> 2021-12-17T16:46:09.3127345Z 1401494 [controller-event-thread] ERROR 
> kafka.controller.ControllerEventManager$ControllerEventThread  - 
> [ControllerEventThread controllerId=0] Error processing event Startup
> 2021-12-17T16:46:09.3128803Z 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /controller
> 2021-12-17T16:46:09.3131970Z  at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> 2021-12-17T16:46:09.3133297Z  at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 2021-12-17T16:46:09.3134202Z  at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:492)
> 2021-12-17T16:46:09.3135241Z  at 
> kafka.zk.KafkaZkClient.registerZNodeChangeHandlerAndCheckExistence(KafkaZkClient.scala:1222)
> 2021-12-17T16:46:09.3135907Z  at 
> kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1147)
> 2021-12-17T16:46:09.3139892Z  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
> 2021-12-17T16:46:09.3140738Z  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> 2021-12-17T16:46:09.3146511Z  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> 2021-12-17T16:46:09.3147294Z  at 
> kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> 2021-12-17T16:46:09.3150920Z  at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
> 2021-12-17T16:46:09.3155198Z  at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> 2021-12-17T16:46:09.3347969Z 1401510 [main] ERROR kafka.server.KafkaServer  - 
> [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown
> 2021-12-17T16:46:09.3351046Z 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /brokers/topics/__consumer_offsets
> 2021-12-17T16:46:09.3352565Z  at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> 2021-12-17T16:46:09.3353401Z  at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 2021-12-17T16:46:09.3354446Z  at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:492)
> 2021-12-17T16:46:09.3355088Z  at 
> kafka.zk.KafkaZkClient$$anonfun$getReplicaAssignmentForTopics$1.apply(KafkaZkClient.scala:468)
> 2021-12-17T16:46:09.3355763Z  at 
> kafka.zk.KafkaZkClient$$anonfun$getReplicaAssignmentForTopics$1.apply(KafkaZkClient.scala:463)
> 2021-12-17T16:46:09.3356500Z  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> 2021-12-17T16:46:09.3357124Z  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> 2021-12-17T16:46:09.3357956Z  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 2021-12-17T16:46:09.3358502Z  at 
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> 2021-12-17T16:46:09.3359069Z  at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> 2021-12-17T16:46:09.3359627Z  at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
> 2021-12-17T16:46:09.3360171Z  at 
> kafka.zk.KafkaZkClient.getReplicaAssignmentForTopics(KafkaZkClient.scala:463)
> 2021-12-17T16:46:09.3360744Z  at 
> kafka.zk.KafkaZkClient.getTopicPartitionCount(KafkaZkClient.scala:513)
> 2021-12-17T16:46:09.3361394Z  at 
> kafka.coordinator.group.GroupMetadataManager.getGroupMetadataTopicPartitionCount(GroupMetadataManager.scala:870)
> 2021-12-17T16:46:09.3362046Z  at 
> kafka.coordinator.group.GroupMetadataManager.(GroupMetadataManager.scala:74)
> 2021-12-17T16:46:09.3362646Z  at 
> kafka.coordinator.group.GroupCoordinator$.apply(GroupCoordinator.scala:906)
> 2021-12-17T16:46:09.3363588Z  at 
> kafka.coordinator.group.GroupCoordinator$.apply(GroupCoordinator.scala:879)
> 2021-12-17T16:46:09.3364062Z  at 
> kafka.server.KafkaSer

[GitHub] [hudi] hudi-bot commented on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-997327537


   
   ## CI report:
   
   * a71eee9dbe65acc7e9c2ba524698334207066c58 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-997323574


   
   ## CI report:
   
   * eac46ffdf438aa73330768656031bc251aa6003d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3864)
 
   * a71eee9dbe65acc7e9c2ba524698334207066c58 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-997323365


   
   ## CI report:
   
   * eac46ffdf438aa73330768656031bc251aa6003d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3864)
 
   * a71eee9dbe65acc7e9c2ba524698334207066c58 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-997323574


   
   ## CI report:
   
   * eac46ffdf438aa73330768656031bc251aa6003d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3864)
 
   * a71eee9dbe65acc7e9c2ba524698334207066c58 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4542)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-980838690


   
   ## CI report:
   
   * eac46ffdf438aa73330768656031bc251aa6003d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#issuecomment-997323365


   
   ## CI report:
   
   * eac46ffdf438aa73330768656031bc251aa6003d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3864)
 
   * a71eee9dbe65acc7e9c2ba524698334207066c58 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997321582


   
   ## CI report:
   
   * 45769dd17905240d5b513d304e5f9e86fe094642 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997317565


   
   ## CI report:
   
   * 394525870ef7d82aa426104f807f5480acae2b7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4478)
 
   * 45769dd17905240d5b513d304e5f9e86fe094642 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lsyldliu commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


lsyldliu commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r771890386



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.
+ */
+public class PartialUpdateWithLatestAvroPayload extends 
DefaultHoodieRecordPayload {
+
+  public PartialUpdateWithLatestAvroPayload(GenericRecord record, Comparable 
orderingVal) {
+super(record, orderingVal);
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties) throws IOException {
+if (recordBytes.length == 0) {
+  return Option.of(currentValue);
+}
+
+GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
+
+// Null check is needed here to support schema evolution. The record in 
storage may be from old schema where
+// the new ordering column might not be present and hence returns null.
+if (!needUpdatingPersistedRecord(currentValue, incomingRecord, 
properties)) {
+  return Option.of(currentValue);
+}
+
+if (isDeleteRecord(incomingRecord)) {
+  return Option.empty();
+}
+
+GenericRecord currentRecord = (GenericRecord) currentValue;
+// The field num in updated record may be less than old record, so only 
update these partial fields to old record.
+List fields = schema.getFields();
+fields.forEach(field -> {
+  Object value = incomingRecord.get(field.name());
+  if (Objects.nonNull(value)) {
+currentRecord.put(field.name(), value);
+  }

Review comment:
   The user story is some fields may miss value in upstream, they don't 
want to override the missing value to null. I have discussed with @danny0405 
offline, here exists two case: 
   1. the field value is really null in `GenericRecord`;
   2. the field value is missing in `GenericRecord` , however, we will get the 
null value.
   
   Because we can't identify the field value is really null or missing, so here 
overrides the field which has value uniformly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lsyldliu commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


lsyldliu commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r771890386



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.
+ */
+public class PartialUpdateWithLatestAvroPayload extends 
DefaultHoodieRecordPayload {
+
+  public PartialUpdateWithLatestAvroPayload(GenericRecord record, Comparable 
orderingVal) {
+super(record, orderingVal);
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties) throws IOException {
+if (recordBytes.length == 0) {
+  return Option.of(currentValue);
+}
+
+GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
+
+// Null check is needed here to support schema evolution. The record in 
storage may be from old schema where
+// the new ordering column might not be present and hence returns null.
+if (!needUpdatingPersistedRecord(currentValue, incomingRecord, 
properties)) {
+  return Option.of(currentValue);
+}
+
+if (isDeleteRecord(incomingRecord)) {
+  return Option.empty();
+}
+
+GenericRecord currentRecord = (GenericRecord) currentValue;
+// The field num in updated record may be less than old record, so only 
update these partial fields to old record.
+List fields = schema.getFields();
+fields.forEach(field -> {
+  Object value = incomingRecord.get(field.name());
+  if (Objects.nonNull(value)) {
+currentRecord.put(field.name(), value);
+  }

Review comment:
   The user story is some fields may miss value in upstream, they don't 
want to override the missing value to null. I have discussed with @danny0405 
offline, here exists two case: 
   1. the field value is really null in `GenericRecord`;
   2. the field value is missing in `GenericRecord` , however, we will get the 
null value.
   
   Because we don't identify the field value is really null or missing, so here 
overrides the field which has value uniformly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lsyldliu commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


lsyldliu commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r771890422



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.

Review comment:
   I will provide a concrete example here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lsyldliu commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs

2021-12-18 Thread GitBox


lsyldliu commented on a change in pull request #4141:
URL: https://github.com/apache/hudi/pull/4141#discussion_r771890386



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import java.util.Properties;
+
+import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro;
+
+/**
+ * The only difference with {@link DefaultHoodieRecordPayload} is that support 
update partial fields
+ * in latest record to old record instead of all fields.
+ */
+public class PartialUpdateWithLatestAvroPayload extends 
DefaultHoodieRecordPayload {
+
+  public PartialUpdateWithLatestAvroPayload(GenericRecord record, Comparable 
orderingVal) {
+super(record, orderingVal);
+  }
+
+  @Override
+  public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties) throws IOException {
+if (recordBytes.length == 0) {
+  return Option.of(currentValue);
+}
+
+GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
+
+// Null check is needed here to support schema evolution. The record in 
storage may be from old schema where
+// the new ordering column might not be present and hence returns null.
+if (!needUpdatingPersistedRecord(currentValue, incomingRecord, 
properties)) {
+  return Option.of(currentValue);
+}
+
+if (isDeleteRecord(incomingRecord)) {
+  return Option.empty();
+}
+
+GenericRecord currentRecord = (GenericRecord) currentValue;
+// The field num in updated record may be less than old record, so only 
update these partial fields to old record.
+List fields = schema.getFields();
+fields.forEach(field -> {
+  Object value = incomingRecord.get(field.name());
+  if (Objects.nonNull(value)) {
+currentRecord.put(field.name(), value);
+  }

Review comment:
   The user story is some fields may miss value in upstream, they don't 
want to override the missing value to null. I have discussed with @danny0405 
offline, here exists two case: 
   1. the field value is really null in `GenericRecord`;
   2. the field value is missing in `GenericRecord` , however, we will get the 
null value.
   Because we don't identify the field value is really null or missing, so here 
overrides the field which has value uniformly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997320893


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997318715


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997318082


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997318715


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4541)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4380: [WIP][DO_NOT_MERGE] Testing CI run 6

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4380:
URL: https://github.com/apache/hudi/pull/4380#issuecomment-997318489


   
   ## CI report:
   
   * 4c933745f9e62d1183f851f97ddd54dbf0465246 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4531)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4380: [WIP][DO_NOT_MERGE] Testing CI run 6

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4380:
URL: https://github.com/apache/hudi/pull/4380#issuecomment-997318292


   
   ## CI report:
   
   * 4c933745f9e62d1183f851f97ddd54dbf0465246 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4380: [WIP][DO_NOT_MERGE] Testing CI run 6

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4380:
URL: https://github.com/apache/hudi/pull/4380#issuecomment-997318292


   
   ## CI report:
   
   * 4c933745f9e62d1183f851f97ddd54dbf0465246 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997318082


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 75f37f592493e7cfa4ac52ceb5c9dbb99eb1e59b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317814


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


nsivabalan commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317826


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317814


   
   ## CI report:
   
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * f5ba5e1654e7c50dc1a53a8d20c1bbe7e1267551 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317586


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4538)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


nsivabalan commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317653


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997317586


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4538)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997316452


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4538)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997317565


   
   ## CI report:
   
   * 394525870ef7d82aa426104f807f5480acae2b7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4478)
 
   * 45769dd17905240d5b513d304e5f9e86fe094642 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4539)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997317314


   
   ## CI report:
   
   * 394525870ef7d82aa426104f807f5480acae2b7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4478)
 
   * 45769dd17905240d5b513d304e5f9e86fe094642 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997317314


   
   ## CI report:
   
   * 394525870ef7d82aa426104f807f5480acae2b7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4478)
 
   * 45769dd17905240d5b513d304e5f9e86fe094642 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-997183906


   
   ## CI report:
   
   * 394525870ef7d82aa426104f807f5480acae2b7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4478)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (bb99836 -> 478f9f3)

2021-12-18 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from bb99836  [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381)
 add 478f9f3  [minor] fix NetworkUtils#getHostname (#4355)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/common/util/NetworkUtils.java | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)


[GitHub] [hudi] danny0405 merged pull request #4355: [minor] fix NetworkUtils#getHostname

2021-12-18 Thread GitBox


danny0405 merged pull request #4355:
URL: https://github.com/apache/hudi/pull/4355


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997316452


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4538)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997314432


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


nsivabalan commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997316057


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (f57e28f -> bb99836)

2021-12-18 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from f57e28f  [MINOR] Azure CI IT tasks clean up (#4337)
 add bb99836  [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy (#4381)

No new revisions were added by this update.

Summary of changes:
 .../utilities/sources/TestJsonKafkaSource.java | 123 +++--
 1 file changed, 64 insertions(+), 59 deletions(-)


[GitHub] [hudi] nsivabalan merged pull request #4381: [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy

2021-12-18 Thread GitBox


nsivabalan merged pull request #4381:
URL: https://github.com/apache/hudi/pull/4381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4381: [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4381:
URL: https://github.com/apache/hudi/pull/4381#issuecomment-997309477


   
   ## CI report:
   
   * 277da990781976b236f020da840d7273e08cbeee Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4532)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4381: [HUDI-3052] Fix flaky testJsonKafkaSourceResetStrategy

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4381:
URL: https://github.com/apache/hudi/pull/4381#issuecomment-997314843


   
   ## CI report:
   
   * 277da990781976b236f020da840d7273e08cbeee Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4532)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997313576


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997314432


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   * 4ad29d0b19e3d22db61755d7159d2b256b03fbc0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997313576


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997312878


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot commented on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997312878


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4536)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4382: [WIP][DO-NOT_MERGE][HUDI-3052] Test ci run dec18

2021-12-18 Thread GitBox


hudi-bot removed a comment on pull request #4382:
URL: https://github.com/apache/hudi/pull/4382#issuecomment-997312579


   
   ## CI report:
   
   * caad6c7cc5609c426953f3e3ce8ecda91af8a9fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   >