[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650991561

   
   ## CI report:
   
   * 463953fa8ffd4e41dcc02c67cf931d894a12848d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18838)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chandu-1101 commented on issue #9141: [BUG] Example from Hudi Quick start doesnt work!

2023-07-25 Thread via GitHub


chandu-1101 commented on issue #9141:
URL: https://github.com/apache/hudi/issues/9141#issuecomment-1650981047

   Wow! Wonderful. Thank you once again. I will put the flag check and get 
back. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] big-doudou commented on pull request #9182: [HUDI-6588] Fix duplicate fileId on TM partial-failover and recovery

2023-07-25 Thread via GitHub


big-doudou commented on PR #9182:
URL: https://github.com/apache/hudi/pull/9182#issuecomment-1650926469

   > Each failed attempt of a subtask would trigger invocation of 
`StreamWriteOperatorCoordinator#subtaskFailed`, the original write metadata 
would got cleaned,
   
   The StreamWriteOperatorCoordinator#subtaskFailed just set eventBuffer=null. 
How does this affect metadata cleaning?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] JingFengWang opened a new issue, #9285: When compiling hudi-0.13.0, the package org.apache.http does not exist error is thrown

2023-07-25 Thread via GitHub


JingFengWang opened a new issue, #9285:
URL: https://github.com/apache/hudi/issues/9285

   **Steps to reproduce the behavior:**
   **run command:**
   mvn clean package -DskipTests -Dflink1.13 -Dscala-2.11
   
   **exception log:**
   [ERROR] COMPILATION ERROR :
   [INFO] -
   [ERROR] 
/D:/project/hudi/hudi-common/src/main/java/org/apache/hudi/common/table/view/RemoteHoodieTableFileSystemView.java:[46,23]
 程序包org.apache.http不存在
   [ERROR] 
/D:/project/hudi/hudi-common/src/main/java/org/apache/hudi/common/table/view/PriorityBasedFileSystemView.java:[35,23]
 找不到符号
 符号:   类 HttpStatus
 位置: 程序包 org.apache.http
   [INFO] 2 errors
   [INFO] -
   [INFO] 

   [INFO] Reactor Summary for Hudi 0.14.0-SNAPSHOT:
   [INFO]
   [INFO] Hudi ... SUCCESS [  3.645 
s]
   [INFO] hudi-tests-common .. SUCCESS [  3.805 
s]
   [INFO] hudi-common  FAILURE [ 12.802 
s]
   [INFO] hudi-hadoop-mr . SKIPPED
   [INFO] hudi-sync-common ... SKIPPED
   [INFO] hudi-hive-sync . SKIPPED
   [INFO] hudi-aws ... SKIPPED
   [INFO] hudi-timeline-service .. SKIPPED
   [INFO] hudi-client  SKIPPED
   [INFO] hudi-client-common . SKIPPED
   [INFO] hudi-spark-client .. SKIPPED
   [INFO] hudi-spark-datasource .. SKIPPED
   [INFO] hudi-spark-common_2.11 . SKIPPED
   [INFO] hudi-spark2_2.11 ... SKIPPED
   [INFO] hudi-java-client ... SKIPPED
   [INFO] hudi-spark_2.11  SKIPPED
   [INFO] hudi-gcp ... SKIPPED
   [INFO] hudi-utilities_2.11  SKIPPED
   [INFO] hudi-utilities-bundle_2.11 . SKIPPED
   [INFO] hudi-cli ... SKIPPED
   [INFO] hudi-flink-client .. SKIPPED
   [INFO] hudi-datahub-sync .. SKIPPED
   [INFO] hudi-adb-sync .. SKIPPED
   [INFO] hudi-sync .. SKIPPED
   [INFO] hudi-hadoop-mr-bundle .. SKIPPED
   [INFO] hudi-datahub-sync-bundle ... SKIPPED
   [INFO] hudi-hive-sync-bundle .. SKIPPED
   [INFO] hudi-aws-bundle  SKIPPED
   [INFO] hudi-gcp-bundle  SKIPPED
   [INFO] hudi-spark2.4-bundle_2.11 .. SKIPPED
   [INFO] hudi-presto-bundle . SKIPPED
   [INFO] hudi-utilities-slim-bundle_2.11  SKIPPED
   [INFO] hudi-timeline-server-bundle  SKIPPED
   [INFO] hudi-trino-bundle .. SKIPPED
   [INFO] hudi-examples .. SKIPPED
   [INFO] hudi-examples-common ... SKIPPED
   [INFO] hudi-examples-spark  SKIPPED
   [INFO] hudi-flink-datasource .. SKIPPED
   [INFO] hudi-flink1.13.x ... SKIPPED
   [INFO] hudi-flink . SKIPPED
   [INFO] hudi-examples-flink  SKIPPED
   [INFO] hudi-examples-java . SKIPPED
   [INFO] hudi-flink1.14.x ... SKIPPED
   [INFO] hudi-flink1.15.x ... SKIPPED
   [INFO] hudi-flink1.16.x ... SKIPPED
   [INFO] hudi-flink1.17.x ... SKIPPED
   [INFO] hudi-kafka-connect . SKIPPED
   [INFO] hudi-flink1.13-bundle .. SKIPPED
   [INFO] hudi-kafka-connect-bundle .. SKIPPED
   [INFO] hudi-cli-bundle_2.11 ... SKIPPED
   [INFO] hudi-spark2-common . SKIPPED
   [INFO] 

   [INFO] BUILD FAILURE
   [INFO] 

   [INFO] Total time:  21.270 s
   [INFO] Finished at: 2023-07-26T10:12:26+08:00
   [INFO] 

   [ERROR] Failed to execute goal 

[GitHub] [hudi] ad1happy2go commented on issue #9143: [SUPPORT] Failure to delete records with missing attributes from PostgresDebeziumSource

2023-07-25 Thread via GitHub


ad1happy2go commented on issue #9143:
URL: https://github.com/apache/hudi/issues/9143#issuecomment-1650912511

   @Sam-Serpoosh In this case we need to maintain global uniqueness , so Global 
Index should be the right option. On a large dataset it might have downside, 
but as partition value is not coming we anyway need to delete doing lookup from 
the entire dataset only. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Sam-Serpoosh commented on issue #9143: [SUPPORT] Failure to delete records with missing attributes from PostgresDebeziumSource

2023-07-25 Thread via GitHub


Sam-Serpoosh commented on issue #9143:
URL: https://github.com/apache/hudi/issues/9143#issuecomment-1650903089

   @ad1happy2go IIUC, here are the options:
   
   - Leverage `REPLICA IDENTITY FULL` which has some downsides as mentioned in 
PG documentation and the article I shared in my earlier comment.
   - Leverage `REPLICA IDENTITY USING ` as long as the field upon which 
we'd like to partition has a UNIQUE index in the upstream PG Table.
   - Leverage the `GLOBAL_BLOOM` indexing you mentioned.
   
   Are there any downsides or trade-offs with the `GLOBAL_BLOOM` index type I 
should keep in mind? I'll try this approach as well on my end **without** 
`REPLICA IDENTITY FULL` and see how it goes. Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


danny0405 commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650874179

   Okay, a static lock mapping makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


Zouxxyy commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650870862

   > Can you elaborate a little more why the table service client can hold a 
separate lock ?
   
   Because `InProcessLockProvider` is valid as long as it is in the same JVM 
process (see `static final Map 
LOCK_INSTANCE_PER_BASEPATH = new ConcurrentHashMap<>();`), other locks can not 
be in the same JVM. Maybe i'm missing something.
   Of course, it is best to use the same lock manager, because it used to be 
like this before #6732. And CI seems to be stable now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650869802

   
   ## CI report:
   
   * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833)
 
   * 463953fa8ffd4e41dcc02c67cf931d894a12848d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18838)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650864450

   
   ## CI report:
   
   * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833)
 
   * 463953fa8ffd4e41dcc02c67cf931d894a12848d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9229: [HUDI-6565] Spark offline compaction add failed retry mechanism

2023-07-25 Thread via GitHub


danny0405 commented on code in PR #9229:
URL: https://github.com/apache/hudi/pull/9229#discussion_r1274291759


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java:
##
@@ -101,6 +104,12 @@ public static class Config implements Serializable {
 public String runningMode = null;
 @Parameter(names = {"--strategy", "-st"}, description = "Strategy Class", 
required = false)
 public String strategyClassName = 
LogFileSizeBasedCompactionStrategy.class.getName();
+@Parameter(names = {"--job-max-processing-time-ms", "-mt"}, description = 
"Take effect when using --mode/-m execute or scheduleAndExecute. "
++ "If maxProcessingTimeMs passed but compaction job is still 
unfinished, hoodie would consider this job as failed and relaunch.")
+public long maxProcessingTimeMs = 0;
+@Parameter(names = {"--retry-last-failed-compaction-job", "-rc"}, 
description = "Take effect when using --mode/-m execute or scheduleAndExecute. "

Review Comment:
   > the failed inflight compaction plan which will never been re-run
   
   Can we fix that rollback by including the inflight compactions instead of 
introducing new config options?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9182: [HUDI-6588] Fix duplicate fileId on TM partial-failover and recovery

2023-07-25 Thread via GitHub


danny0405 commented on PR #9182:
URL: https://github.com/apache/hudi/pull/9182#issuecomment-1650862935

   Each failed attempt of a subtask would trigger invocation of 
`StreamWriteOperatorCoordinator#subtaskFailed`, the original write metadata 
would got cleaned,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 merged pull request #9274: [MINOR] fix millis append format error

2023-07-25 Thread via GitHub


danny0405 merged PR #9274:
URL: https://github.com/apache/hudi/pull/9274


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [MINOR] Fix millis append format error (#9274)

2023-07-25 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a022393388 [MINOR] Fix millis append format error (#9274)
2a022393388 is described below

commit 2a0223933884cb044e7aa56f205cae926358a030
Author: KnightChess <981159...@qq.com>
AuthorDate: Wed Jul 26 10:02:53 2023 +0800

[MINOR] Fix millis append format error (#9274)
---
 .../apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java
index 5223227fce9..366d654bec1 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java
@@ -114,7 +114,7 @@ public class HoodieInstantTimeGenerator {
 
   /**
* Creates an instant string given a valid date-time string.
-   * @param dateString A date-time string in the format -MM-dd 
HH:mm:ss[:SSS]
+   * @param dateString A date-time string in the format -MM-dd 
HH:mm:ss[.SSS]
* @return A timeline instant
* @throws ParseException If we cannot parse the date string
*/
@@ -124,7 +124,7 @@ public class HoodieInstantTimeGenerator {
 } catch (Exception e) {
   // Attempt to add the milliseconds in order to complete parsing
   return getInstantFromTemporalAccessor(LocalDateTime.parse(
-  String.format("%s:%s", dateString, DEFAULT_MILLIS_EXT), 
MILLIS_GRANULARITY_DATE_FORMATTER));
+  String.format("%s.%s", dateString, DEFAULT_MILLIS_EXT), 
MILLIS_GRANULARITY_DATE_FORMATTER));
 }
   }
 



[GitHub] [hudi] codope closed issue #8761: [SUPPORT] "Illegal Lambda Deserialization" When Leveraging PostgresDebeziumSource

2023-07-25 Thread via GitHub


codope closed issue #8761: [SUPPORT] "Illegal Lambda Deserialization" When 
Leveraging PostgresDebeziumSource
URL: https://github.com/apache/hudi/issues/8761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


danny0405 commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650853733

   > I think about it again, and think that it’s okay to not pass the txtmanger
   
   Can you elaborate a little more why the table service client can hold a 
separate lock ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


danny0405 commented on code in PR #9211:
URL: https://github.com/apache/hudi/pull/9211#discussion_r1274284407


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -114,10 +116,28 @@ public void testCheckpointFails() throws Exception {
   }
 
   @Test
-  public void testSubtaskFails() throws Exception {
+  public void testSubtaskFailsWithEagerFailedWritesCleanPolicy() throws 
Exception {
+testSubtaskFails()
+// the last checkpoint instant was rolled back by subTaskFails(0, 2)
+// with EAGER cleaning strategy
+.assertNoEvent()

Review Comment:
   Can we add new tests instead of modifying existing one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650807747

   
   ## CI report:
   
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18837)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650614871

   
   ## CI report:
   
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18837)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


kazdy commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650583696

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9276: Mor perf spark33

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9276:
URL: https://github.com/apache/hudi/pull/9276#issuecomment-1650565195

   
   ## CI report:
   
   * 54a4e7e9aeabb42258e0d1f2b6cfa2960275c330 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18836)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9276: Mor perf spark33

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9276:
URL: https://github.com/apache/hudi/pull/9276#issuecomment-1650555772

   
   ## CI report:
   
   * 37d3b9365a38e8f266c1c486e9d18c9ef34be2a0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18808)
 
   * 54a4e7e9aeabb42258e0d1f2b6cfa2960275c330 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha opened a new pull request, #9284: [DOCS] Change algolia search to leverage crawler instead of legacy do…

2023-07-25 Thread via GitHub


bhasudha opened a new pull request, #9284:
URL: https://github.com/apache/hudi/pull/9284

   …csearch
   
   ### Change Logs
   
   Migrating to new crawler based search
   
   ### Impact
   
   can affect website search functionality
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kkalanda-score closed pull request #9283: CI (Take 2)

2023-07-25 Thread via GitHub


kkalanda-score closed pull request #9283: CI (Take 2)
URL: https://github.com/apache/hudi/pull/9283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kkalanda-score opened a new pull request, #9283: CI (Take 2)

2023-07-25 Thread via GitHub


kkalanda-score opened a new pull request, #9283:
URL: https://github.com/apache/hudi/pull/9283

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (42799c0956f -> 03bc5549c7a)

2023-07-25 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 42799c0956f [HUDI-6438] Config parameter 'MAKE_NEW_COLUMNS_NULLABLE' 
to allow for marking a newly created column as nullable. (#9262)
 add 03bc5549c7a [HUDI-6509] Add GitHub CI for Java 17 (#9136)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml  | 105 ++--
 .github/workflows/pr_compliance.yml|   2 +-
 hudi-common/pom.xml|   7 +
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  |   8 +-
 .../common/fs/TestHoodieWrapperFileSystem.java |  30 +++-
 .../common/functional/TestHoodieLogFormat.java |  25 ++-
 .../TestHoodieLogFormatAppendFailure.java  |  10 +-
 .../hudi/common/testutils/HoodieTestUtils.java |  29 
 .../util/TestDFSPropertiesConfiguration.java   |  14 +-
 .../hudi/common/util/TestObjectSizeCalculator.java |  30 ++--
 .../spark/sql/hive/TestHiveClientUtils.scala   |  25 ++-
 packaging/bundle-validation/Dockerfile |  25 +++
 packaging/bundle-validation/ci_run.sh  |   5 +-
 .../bundle-validation/conf/core-site.xml   |  14 +-
 .../bundle-validation/conf/hdfs-site.xml   |  25 ++-
 .../docker_java17/TestHiveClientUtils.scala|  27 ++--
 .../docker_java17/docker_java17_test.sh| 178 +
 packaging/bundle-validation/run_docker_java17.sh   | 116 ++
 packaging/bundle-validation/validate.sh|   3 +-
 pom.xml|  16 +-
 20 files changed, 605 insertions(+), 89 deletions(-)
 copy docker/hoodie/hadoop/hive_base/conf/hive-site.xml => 
packaging/bundle-validation/conf/core-site.xml (78%)
 copy hudi-flink-datasource/hudi-flink/src/test/resources/hive-site.xml => 
packaging/bundle-validation/conf/hdfs-site.xml (71%)
 copy 
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/IndexRecord.java
 => packaging/bundle-validation/docker_java17/TestHiveClientUtils.scala (60%)
 create mode 100755 
packaging/bundle-validation/docker_java17/docker_java17_test.sh
 create mode 100755 packaging/bundle-validation/run_docker_java17.sh



[GitHub] [hudi] yihua merged pull request #9136: [HUDI-6509] Add GitHub CI for Java 17

2023-07-25 Thread via GitHub


yihua merged PR #9136:
URL: https://github.com/apache/hudi/pull/9136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #9136: [HUDI-6509] Add GitHub CI for Java 17

2023-07-25 Thread via GitHub


yihua commented on PR #9136:
URL: https://github.com/apache/hudi/pull/9136#issuecomment-1650381811

   CI has passed for 
[6b33d37](https://github.com/apache/hudi/pull/9136/commits/6b33d37bc57d2b5be3649590fee6767f34cccea3).
  No need to rerun CI again.
   https://github.com/apache/hudi/assets/2497195/eb90a7d2-c98a-4f3d-aa83-d0c6b6d7efe4;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6591) getAllPartitionPaths perf fix need to account for parquet/orc partition path meta files

2023-07-25 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6591:
-

 Summary: getAllPartitionPaths perf fix need to account for 
parquet/orc partition path meta files
 Key: HUDI-6591
 URL: https://issues.apache.org/jira/browse/HUDI-6591
 Project: Apache Hudi
  Issue Type: Bug
  Components: reader-core
Reporter: sivabalan narayanan


[https://github.com/apache/hudi/pull/9121/files?diff=split=0#r1263994796]

we might need to follow up to ensure we dont' break the parquet/orc partition 
meta file flows. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on a diff in pull request #9136: [HUDI-6509] Add GitHub CI for Java 17

2023-07-25 Thread via GitHub


yihua commented on code in PR #9136:
URL: https://github.com/apache/hudi/pull/9136#discussion_r1273945242


##
.github/workflows/bot.yml:
##
@@ -112,6 +112,61 @@ jobs:
 run:
   mvn test -Pfunctional-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
 
+  test-spark-java17:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+include:
+  - scalaProfile: "scala-2.12"
+sparkProfile: "spark3.3"
+sparkModules: "hudi-spark-datasource/hudi-spark3.3.x"
+  - scalaProfile: "scala-2.12"
+sparkProfile: "spark3.4"
+sparkModules: "hudi-spark-datasource/hudi-spark3.4.x"
+
+steps:
+  - uses: actions/checkout@v3
+  - name: Set up JDK 8
+uses: actions/setup-java@v3
+with:
+  java-version: '8'
+  distribution: 'adopt'
+  architecture: x64
+  - name: Build Project
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+run:
+  mvn clean install -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DskipTests=true $MVN_ARGS
+  - name: Set up JDK 17
+uses: actions/setup-java@v3
+with:
+  java-version: '17'
+  distribution: 'adopt'
+  architecture: x64
+  - name: Quickstart Test
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+run:
+  mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-pl hudi-examples/hudi-examples-spark $MVN_ARGS
+  - name: UT - Common & Spark
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_MODULES: ${{ matrix.sparkModules }}
+if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 
as it's covered by Azure CI

Review Comment:
   nit: not required.



##
.github/workflows/bot.yml:
##
@@ -151,6 +206,34 @@ jobs:
   mvn clean install -Pintegration-tests -D"$SCALA_PROFILE" 
-D"$FLINK_PROFILE" -pl hudi-flink-datasource/hudi-flink -am 
-Davro.version=1.10.0 -DskipTests=true $MVN_ARGS
   mvn verify -Pintegration-tests -D"$SCALA_PROFILE" -D"$FLINK_PROFILE" 
-pl hudi-flink-datasource/hudi-flink $MVN_ARGS
 
+  docker-java17-test:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+include:
+  - flinkProfile: 'flink1.17'
+sparkProfile: 'spark3.4'
+sparkRuntime: 'spark3.4.0'
+
+steps:
+  - uses: actions/checkout@v3
+  - name: Set up JDK 8
+uses: actions/setup-java@v3
+with:
+  java-version: '8'
+  distribution: 'adopt'
+  architecture: x64
+  - name: UT/FT - Docker Test - OpenJDK 17
+env:
+  FLINK_PROFILE: ${{ matrix.flinkProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_RUNTIME: ${{ matrix.sparkRuntime }}
+  SCALA_PROFILE: 'scala-2.12'
+if: ${{ env.SPARK_PROFILE >= 'spark3.4' }} # Only support Spark 3.4 
for now

Review Comment:
   nit: not required.



##
packaging/bundle-validation/run_docker_java17.sh:
##
@@ -0,0 +1,116 @@
+#!/bin/bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+echo "SPARK_RUNTIME: $SPARK_RUNTIME SPARK_PROFILE (optional): $SPARK_PROFILE"
+echo "SCALA_PROFILE: $SCALA_PROFILE"
+CONTAINER_NAME=hudi_docker
+DOCKER_TEST_DIR=/opt/bundle-validation/docker-test
+
+# choose versions based on build profiles
+if [[ ${SPARK_RUNTIME} == 'spark2.4.8' ]]; then
+  HADOOP_VERSION=2.7.7
+  HIVE_VERSION=2.3.9
+  DERBY_VERSION=10.10.2.0
+  FLINK_VERSION=1.13.6
+  SPARK_VERSION=2.4.8
+  SPARK_HADOOP_VERSION=2.7
+  CONFLUENT_VERSION=5.5.12
+  KAFKA_CONNECT_HDFS_VERSION=10.1.13
+  IMAGE_TAG=flink1136hive239spark248
+elif [[ ${SPARK_RUNTIME} == 'spark3.0.2' ]]; then
+  HADOOP_VERSION=2.7.7
+  HIVE_VERSION=3.1.3
+  DERBY_VERSION=10.14.1.0
+  FLINK_VERSION=1.14.6
+  SPARK_VERSION=3.0.2
+  SPARK_HADOOP_VERSION=2.7
+  CONFLUENT_VERSION=5.5.12
+  KAFKA_CONNECT_HDFS_VERSION=10.1.13
+  IMAGE_TAG=flink1146hive313spark302
+elif [[ 

[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650373193

   
   ## CI report:
   
   * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on issue #7244: [SUPPORT] DBT Merge creates duplicates

2023-07-25 Thread via GitHub


amrishlal commented on issue #7244:
URL: https://github.com/apache/hudi/issues/7244#issuecomment-1650363772

   Verified using the latest master using the same model as @ad1happy2go above 
and successfully ran the model
   
   **DBT run**
   ```amrish@Amrishs-MBP github-issue-7244 % dbt run
   18:45:29  Running with dbt=1.5.3
   18:45:29  [WARNING]: Deprecated functionality
   The `source-paths` config has been renamed to `model-paths`. Please update 
your
   `dbt_project.yml` configuration to reflect this change.
   18:45:29  [WARNING]: Deprecated functionality
   The `data-paths` config has been renamed to `seed-paths`. Please update your
   `dbt_project.yml` configuration to reflect this change.
   18:45:29  Registered adapter: spark=1.5.0
   18:45:29  Found 1 model, 2 tests, 0 snapshots, 0 analyses, 357 macros, 0 
operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
   18:45:29  
   18:45:31  Concurrency: 1 threads (target='dev')
   18:45:31  
   18:45:31  1 of 1 START sql incremental model default.issue_7244_model 
 [RUN]
   18:45:38  1 of 1 OK created sql incremental model default.issue_7244_model 
... [OK in 7.93s]
   18:45:39  
   18:45:39  Finished running 1 incremental model in 0 hours 0 minutes and 9.22 
seconds (9.22s).
   18:45:39  
   18:45:39  Completed successfully
   18:45:39  
   18:45:39  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
   amrish@Amrishs-MBP github-issue-7244 % 
   ```
   
   **spark-sql verification**
   
   ```
   spark-sql> show databases;
   default
   test_database1
   Time taken: 2.562 seconds, Fetched 2 row(s)
   spark-sql> use default
> ;
   23/07/25 11:47:20 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
   Time taken: 0.125 seconds
   spark-sql> show tables;
   issue_7244_model
   my_first_dbt_model
   my_first_dbt_model1
   my_second_dbt_model
   Time taken: 0.263 seconds, Fetched 4 row(s)
   spark-sql> select * from issue_7244_model
> ;
   23/07/25 11:47:43 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   23/07/25 11:47:43 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   2023072511453132720230725114531327_1_4   2   
fbb84dbc-e72f-4ac6-990a-d0205e2aaab3-0_1-33-0_20230725114531327.parquet 2   
anyway  2023-07-25 11:45:31.367
   2023072511453132720230725114531327_2_5   3   
c1d85730-7a1a-4845-bb4a-1b7128f6de3d-0_2-34-0_20230725114531327.parquet 3   
bye 2023-07-25 11:45:31.367
   2023072511453132720230725114531327_0_6   1   
1da126fe-eb3a-4982-ab77-f294458eefea-0_0-32-0_20230725114531327.parquet 1   
yo  2023-07-25 11:45:31.367
   Time taken: 4.461 seconds, Fetched 3 row(s)
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650303969

   
   ## CI report:
   
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] chattarajoy commented on pull request #8795: [HUDI-6258] support olap engine query mor table in table name without ro/rt suffix

2023-07-25 Thread via GitHub


chattarajoy commented on PR #8795:
URL: https://github.com/apache/hudi/pull/8795#issuecomment-1650303459

   Is there a place where I can find the timeline on when this can possibly be 
released?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-07-25 Thread via GitHub


xushiyan commented on code in PR #9221:
URL: https://github.com/apache/hudi/pull/9221#discussion_r1273870474


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java:
##
@@ -98,8 +98,9 @@ public HiveSyncConfig(Properties props) {
 
   public HiveSyncConfig(Properties props, Configuration hadoopConf) {
 super(props, hadoopConf);
-HiveConf hiveConf = hadoopConf instanceof HiveConf
-? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class);
+HiveConf hiveConf = new HiveConf();
+// HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
+hiveConf.addResource(hadoopConf);

Review Comment:
   i think the ideal approach is to make the passed-in `hiveConf` load hadoop 
conf properly to use `AWSGlueClientFactory` at the very beginning (when 
creating hive sync config) so that nothing needs to load at this point.  cc 
@yihua 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rmnlchh commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure

2023-07-25 Thread via GitHub


rmnlchh commented on issue #9282:
URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650228180

   > @rmnlchh Just curious, Did you set these configs
   > 
   > ```
   > sc.set("spark.sql.legacy.parquet.nanosAsLong", "false");
   > sc.set("spark.sql.parquet.binaryAsString", "false");
   > sc.set("spark.sql.parquet.int96AsTimestamp", "true");
   > sc.set("spark.sql.caseSensitive", "false");
   > ```
   > 
   > with your deltastreamer also? I will try to reproduce this issue .
   
   Yes, adding all the DS configs
   println(s"hoodieDeltaStreamerConfig=$hoodieDeltaStreamerConfig")
   println(s"typedProperties=$typedProperties")
   println("HERE JSC" + jsc.getConf.getAll.mkString)
   val hoodieDeltaStreamer = new HoodieDeltaStreamer(hoodieDeltaStreamerConfig, 
jsc
, FSUtils.getFs(hoodieDeltaStreamerConfig.targetBasePath, conf), 
jsc.hadoopConfiguration
, org.apache.hudi.common.util.Option.of(typedProperties)
   )
   
hoodieDeltaStreamerConfig=Config{targetBasePath='/XXX/cdp-datapipeline-curation/cdp-datapipeline-curation/datalake-deltastreamer/./tmp/CreativeDeltaStreamerTest/Domain=CampaignBuild/Table=published_creative/',
 targetTableName='published_creative', tableType='MERGE_ON_READ', 
baseFileFormat='PARQUET', 
propsFilePath='file://XXX/cdp-datapipeline-curation/cdp-datapipeline-curation/datalake-deltastreamer/src/test/resources/delta-streamer-config/dfs-source.properties',
 configs=[], 
sourceClassName='org.apache.hudi.utilities.sources.AvroKafkaSource', 
sourceOrderingField='AssetValue', 
payloadClassName='org.apache.hudi.common.model.OverwriteWithLatestAvroPayload', 
schemaProviderClassName='com.cardlytics.datapipeline.deltastreamer.schema.ResourceBasedSchemaProvider',
 
transformerClassNames=[org.apache.hudi.utilities.transform.SqlQueryBasedTransformer],
 sourceLimit=9223372036854775807, operation=UPSERT, filterDupes=false, 
enableHiveSync=false, enableMetaSync=false, forceEmptyMetaSync=false, syn
 cClientToolClassNames=org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool, 
maxPendingCompactions=5, maxPendingClustering=5, continuousMode=false, 
minSyncIntervalSeconds=0, sparkMaster='', commitOnErrors=false, 
deltaSyncSchedulingWeight=1, compactSchedulingWeight=1, 
clusterSchedulingWeight=1, deltaSyncSchedulingMinShare=0, 
compactSchedulingMinShare=0, clusterSchedulingMinShare=0, 
forceDisableCompaction=true, checkpoint='null', 
initialCheckpointProvider='null', help=false}
   typedProperties={spark.sql.avro.compression.codec=snappy, 
hoodie.datasource.hive_sync.table=published_creative, 
hoodie.datasource.hive_sync.partition_fields=Entity, 
hoodie.metadata.index.column.stats.enable=false, hoodie.index.type=BLOOM, 
hoodie.datasource.write.reconcile.schema=true, 
hoodie.deltastreamer.schemaprovider.source.schema.file=domain/campaignbuild/schema/creative.avsc,
 bootstrap.servers=PLAINTEXT://localhost:34873, hoodie.compact.inline=false, 
hoodie.deltastreamer.transformer.sql=
   SELECT
   'Creative' Entity
   ,o.CreativeId
   ,o.PreMessageImpression
   ,o.PostMessageImpression
   ,o.Assets.Type AssetType
   ,o.Assets.Slot AssetSlot
   ,o.Assets.Label AssetLabel
   ,o.Assets.Value AssetValue
   FROM
   (SELECT a.CreativeId, a.PreMessageImpression, a.PostMessageImpression, 
explode(a.Assets) Assets
FROM
 a) o
, hoodie.parquet.max.file.size=6291456, 
hoodie.datasource.write.recordkey.field=CreativeId,AssetSlot, 
hoodie.index.bloom.num_entries=6, 
hoodie.datasource.hive_sync.support_timestamp=true, 
hoodie.metadata.enable=false, schema.registry.url=http://localhost:34874, 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator,
 hoodie.datasource.write.table.type=MERGE_ON_READ, 
hoodie.deltastreamer.source.kafka.topic=CMPN-CmpnPub-AdServer-Creative, 
hoodie.datasource.write.hive_style_partitioning=true, 
hoodie.metadata.insert.parallelism=1, 
hoodie.deltastreamer.schemaprovider.spark_avro_post_processor.enable=false, 
hoodie.parquet.compression.codec=snappy, spark.io.compression.codec=snappy, 
hoodie.deltastreamer.schemaprovider.target.schema.file=domain/campaignbuild/schema/published_creative_table.json,
 hoodie.bloom.index.prune.by.ranges=true, 
hoodie.datasource.write.partitionpath.field=Entity, 
hoodie.datasource.write.keygenerator.consistent.logical.time
 stamp.enabled=true, hoodie.parquet.block.size=6291456, 
hoodie.cleaner.fileversions.retained=2, hoodie.table.name=published_creative, 
hoodie.upsert.shuffle.parallelism=4, 
hoodie.meta.sync.client.tool.class=org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool,
 spark.sql.parquet.compression.codec=snappy, 
hoodie.datasource.write.precombine.field=AssetValue, 
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload,
 

[GitHub] [hudi] ad1happy2go commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure

2023-07-25 Thread via GitHub


ad1happy2go commented on issue #9282:
URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650223209

   @rmnlchh Just curious, Did you set these configs 
   ```
   sc.set("spark.sql.legacy.parquet.nanosAsLong", "false");
   sc.set("spark.sql.parquet.binaryAsString", "false");
   sc.set("spark.sql.parquet.int96AsTimestamp", "true");
   sc.set("spark.sql.caseSensitive", "false");
   ```
   with your deltastreamer also? 
   I will try to reproduce this issue .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6589:

Fix Version/s: (was: 0.12.1)

> Upsert failing for array type if value given [null]
> ---
>
> Key: HUDI-6589
> URL: https://issues.apache.org/jira/browse/HUDI-6589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Aditya Goenka
>Priority: Critical
>
> Hudi Upserts are failing when data in a nested field is [null],
> Details in GitHub issue (see last comment) - 
> [https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka closed HUDI-6589.
---
Fix Version/s: 0.12.1
   (was: 0.15.0)
   Resolution: Resolved

> Upsert failing for array type if value given [null]
> ---
>
> Key: HUDI-6589
> URL: https://issues.apache.org/jira/browse/HUDI-6589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.12.1
>
>
> Hudi Upserts are failing when data in a nested field is [null],
> Details in GitHub issue (see last comment) - 
> [https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747076#comment-17747076
 ] 

Aditya Goenka commented on HUDI-6589:
-

Upon further investigation and debugging, it has been determined that to 
address the issue related to Avro-parquet compatibility and allow arrays with 
null elements, you need to set the Spark configuration parameter 
spark.hadoop.parquet.avro.write-old-list-structure to false.

This configuration parameter controls the behavior of how Avro arrays with null 
elements are written to Parquet format. By default, Avro arrays with null 
elements are written in a way that preserves their internal structure, which 
can cause compatibility problems with certain tools. By setting 
spark.hadoop.parquet.avro.write-old-list-structure to false, you enable support 
for arrays with null elements and ensure they are handled correctly during the 
write process.

This was not a Hudi issue. I was able to insert the record you pasted by just 
setting this --conf 'spark.hadoop.parquet.avro.write-old-list-structure=false

> Upsert failing for array type if value given [null]
> ---
>
> Key: HUDI-6589
> URL: https://issues.apache.org/jira/browse/HUDI-6589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.15.0
>
>
> Hudi Upserts are failing when data in a nested field is [null],
> Details in GitHub issue (see last comment) - 
> [https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka resolved HUDI-6589.
-

> Upsert failing for array type if value given [null]
> ---
>
> Key: HUDI-6589
> URL: https://issues.apache.org/jira/browse/HUDI-6589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.15.0
>
>
> Hudi Upserts are failing when data in a nested field is [null],
> Details in GitHub issue (see last comment) - 
> [https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] bhasudha commented on issue #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure

2023-07-25 Thread via GitHub


bhasudha commented on issue #9282:
URL: https://github.com/apache/hudi/issues/9282#issuecomment-1650203382

   @yihua @ad1happy2go  if you can help reproduce and trige this further. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9105:
URL: https://github.com/apache/hudi/pull/9105#issuecomment-1650200162

   
   ## CI report:
   
   * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] rmnlchh opened a new issue, #9282: [ISSUE] Hudi 0.13.0. Spark 3.3.2 Deltastreamed table read failure

2023-07-25 Thread via GitHub


rmnlchh opened a new issue, #9282:
URL: https://github.com/apache/hudi/issues/9282

   As part of our pipelines, we use tables that are being deltastreamed. Trying 
to upgrade to EMR 6.11 (which bring hudi 0.13.0/spark 3.3.2) we started facing 
issue which is discussed in 
   https://github.com/apache/hudi/issues/8061#issuecomment-1447657892
   The fix with 
   sc.set("spark.sql.legacy.parquet.nanosAsLong", "false");
   sc.set("spark.sql.parquet.binaryAsString", "false");
   sc.set("spark.sql.parquet.int96AsTimestamp", "true");
   sc.set("spark.sql.caseSensitive", "false");
   worked for all the cases except for those where we query delta streamed 
tables.
   
   Steps to reproduce the behavior:
   
   1. Use hudi 0.13.0, spark 3.3.2
   2. Used spark configs:
   spark.shuffle.spill.compress -> true
   spark.serializer -> org.apache.spark.serializer.KryoSerializer
   spark.sql.warehouse.dir -> 
file:/XXX/cdp-datapipeline-curation/datalake-deltastreamer/spark-warehouse
   spark.sql.parquet.int96AsTimestamp -> true
   spark.io.compression.lz4.blockSize -> 64k
   spark.executor.extraJavaOptions -> -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
   spark.driver.host -> 127.0.0.1
   spark.sql.hive.convertMetastoreParquet -> false
   spark.broadcast.compress -> true
   spark.io.compression.codec -> snappy
   spark.sql.adaptive.skewJoin.enabled -> true
   spark.sql.parquet.binaryAsString -> false
   spark.driver.port -> 36083
   spark.rdd.compress -> true
   spark.io.compression.zstd.level -> 1
   spark.sql.caseSensitive -> false
   spark.shuffle.compress -> true
   spark.io.compression.zstd.bufferSize -> 64k
   spark.sql.catalog -> org.apache.spark.sql.hudi.catalog.HoodieCatalog
   spark.sql.parquet.int96RebaseModeInRead -> LEGACY
   spark.memory.storageFraction -> 0.20
   spark.app.name -> CreativeDeltaStreamerTest-creative-deltastreamer-1689954313
   spark.sql.parquet.datetimeRebaseModeInWrite -> LEGACY
   spark.sql.parquet.outputTimestampType -> TIMESTAMP_MICROS
   spark.sql.avro.datetimeRebaseModeInWrite -> LEGACY
   spark.sql.avro.compression.codec -> snappy
   spark.sql.legacy.parquet.nanosAsLong -> false
   spark.sql.extension -> org.apache.spark.sql.hudi.HoodieSparkSessionExtension
   spark.app.startTime -> 1689968713919
   spark.executor.id -> driver
   spark.sql.parquet.enableVectorizedReader -> true
   spark.sql.legacy.timeParserPolicy -> LEGACY
   spark.driver.extraJavaOptions -> -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
   spark.sql.parquet.datetimeRebaseModeInRead -> LEGACY
   spark.driver.memoryOverheadFactor -> 0.15
   spark.master -> local[*]
   spark.sql.parquet.filterPushdown -> true
   spark.executor.cores -> 1
   spark.memory.fraction -> 0.50
   spark.sql.avro.datetimeRebaseModeInRead -> LEGACY
   spark.executor.memoryOverheadFactor -> 0.20
   spark.sql.parquet.compression.codec -> snappy
   spark.sql.parquet.recordLevelFilter.enabled -> true
   spark.app.id -> local-1689968714613
   3. Used Delta streamer configs
   hoodie.datasource.hive_sync.database -> datalake_ods_local
   hoodie.datasource.hive_sync.support_timestamp -> true
   hoodie.datasource.write.precombine.field -> StartDateUtc
   hoodie.datasource.hive_sync.partition_fields -> CampaignId
   hoodie.metadata.index.column.stats.enable -> true
   hoodie.cleaner.fileversions.retained -> 2
   hoodie.parquet.max.file.size -> 6291456
   hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled -> 
true
   hoodie.bloom.index.prune.by.ranges -> true
   hoodie.parquet.block.size -> 6291456
   hoodie.metadata.enable -> true
   hoodie.datasource.hive_sync.table 

[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9246:
URL: https://github.com/apache/hudi/pull/9246#issuecomment-1650135501

   
   ## CI report:
   
   * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650135633

   
   ## CI report:
   
   * 257b18bc9faffdf7d063fb153e5ee1b53d57 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18797)
 
   * 4e64258913f8f19b139ab1407f0c08d812f65669 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650121068

   
   ## CI report:
   
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828)
 
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9255: [HUDI-6503] Make TableServiceClient's txnManager consistent with Writ…

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9255:
URL: https://github.com/apache/hudi/pull/9255#issuecomment-1650120837

   
   ## CI report:
   
   * 257b18bc9faffdf7d063fb153e5ee1b53d57 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18797)
 
   * 4e64258913f8f19b139ab1407f0c08d812f65669 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650105696

   
   ## CI report:
   
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828)
 
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   * da9dd1fc203c01d0a000d49dcbd58a0a1d729354 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6590) Improve BigQuery Sync Schema and Partition Handling

2023-07-25 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-6590:

Summary: Improve BigQuery Sync Schema and Partition Handling  (was: Improve 
BigQuery Sync Support)

> Improve BigQuery Sync Schema and Partition Handling
> ---
>
> Key: HUDI-6590
> URL: https://issues.apache.org/jira/browse/HUDI-6590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> Add features for Schema evolution and listing only required base files while 
> querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6590) Improve BigQuery Sync Support

2023-07-25 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-6590:
---

Assignee: Timothy Brown

> Improve BigQuery Sync Support
> -
>
> Key: HUDI-6590
> URL: https://issues.apache.org/jira/browse/HUDI-6590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Minor
>
> Add features for Schema evolution and listing only required base files while 
> querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6590) Improve BigQuery Sync Support

2023-07-25 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6590:
---

 Summary: Improve BigQuery Sync Support
 Key: HUDI-6590
 URL: https://issues.apache.org/jira/browse/HUDI-6590
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Timothy Brown


Add features for Schema evolution and listing only required base files while 
querying the table to cut down on BigQuery usage costs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1650029611

   
   ## CI report:
   
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828)
 
   * cf692aeb6c7774b01a236cf058225debb8caff53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9212:
URL: https://github.com/apache/hudi/pull/9212#issuecomment-1650029030

   
   ## CI report:
   
   * 32766783236e3f0b5adcc973a77ff9cf782726e5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18829)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649902580

   
   ## CI report:
   
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #7244: [SUPPORT] DBT Merge creates duplicates

2023-07-25 Thread via GitHub


ad1happy2go commented on issue #7244:
URL: https://github.com/apache/hudi/issues/7244#issuecomment-1649877906

   @faizhasan @rshanmugam1 Apologies for the delay here. I tried to reproduce 
and found out that it is working fine. I tried with 0.12.1 version.
   Model I used, exactly like we have in ticket
   ```
   {{ config(
   materialized = 'incremental',
   incremental_strategy = 'merge',
   file_format = 'hudi',
   options={
 'type': 'cow',
 'primaryKey': 'id',
 'preCombineKey': 'ts',
   },
   unique_key = 'id',
   location_root='file:///tmp/dbt/issue_7244_1/'
   ) }}
   {% if not is_incremental() %}
   
   select cast(1 as bigint) as id, 'yo' as msg, current_timestamp() as ts
   union all
   select  cast(2 as bigint) as id, 'anyway' as msg, current_timestamp() as ts
   union all
   select  cast(3 as bigint) as id, 'bye' as msg, current_timestamp() as ts
   
   {% else %}
   
   select  cast(1 as bigint) as id, 'yo_updated' as msg, current_timestamp() as 
ts
   union all
   select cast(2 as bigint) as id, 'anyway_updated' as msg, current_timestamp() 
as ts
   union all
   select  cast(3 as bigint) as id, 'bye_updated' as msg, current_timestamp() 
as ts
   
   {% endif %}
   ```
   here are the results after first and second run -- 
   
   
![image](https://github.com/apache/hudi/assets/63430370/1d8b2c1e-7bee-44ff-a146-b62e45227c90)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9246:
URL: https://github.com/apache/hudi/pull/9246#issuecomment-1649827320

   
   ## CI report:
   
   * 136d780d0a9ca38f88c613433f05f868be01d0d5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18734)
 
   * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9105:
URL: https://github.com/apache/hudi/pull/9105#issuecomment-1649826685

   
   ## CI report:
   
   * 5611851113d971d2f76fe2072ca87c1df0eae6ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18404)
 
   * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9246:
URL: https://github.com/apache/hudi/pull/9246#issuecomment-1649811839

   
   ## CI report:
   
   * 136d780d0a9ca38f88c613433f05f868be01d0d5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18734)
 
   * c2effa1ea1fdd82828efbf88afbf6cd6be019eb3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9105: [HUDI-6459] Add Rollback and multi-writer tests for Record Level Index

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9105:
URL: https://github.com/apache/hudi/pull/9105#issuecomment-1649811247

   
   ## CI report:
   
   * 5611851113d971d2f76fe2072ca87c1df0eae6ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18404)
 
   * 6bd80d5ce84b468293bc292f43dd0ca236c646d8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9211:
URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649795214

   
   ## CI report:
   
   * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN
   * f8607c6bd9ecf09e8da2d6b372a80eff2221108d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18826)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #9246: [HUDI-6548] Two log compaction instants can be scheduled at the same time

2023-07-25 Thread via GitHub


lokeshj1703 commented on code in PR #9246:
URL: https://github.com/apache/hudi/pull/9246#discussion_r1273472332


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -277,7 +278,7 @@ public HoodieTimeline getCommitsTimeline() {
* Get all instants (commits, delta commits, replace, compaction) that 
produce new data or merge file, in the active timeline.
*/
   public HoodieTimeline getCommitsAndCompactionTimeline() {
-return getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, 
DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION));
+return getTimelineOfActions(CollectionUtils.createSet(COMMIT_ACTION, 
DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION, COMPACTION_ACTION, 
LOG_COMPACTION_ACTION));

Review Comment:
   @nsivabalan This was added as part of 
https://github.com/apache/hudi/pull/9038. Seems like this API should also 
consider inflight logcompaction. I have removed it from this PR but if it makes 
sense I will create a separate PR for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zaza commented on issue #6900: [SUPPORT]Hudi Failed to read MARKERS file

2023-07-25 Thread via GitHub


zaza commented on issue #6900:
URL: https://github.com/apache/hudi/issues/6900#issuecomment-1649736303

   This is definitely still an issue, we were hit by an error that looks 
identical to what @umehrot2 reported a while ago:
   
   ```
   ERROR UpsertPartitioner: Error trying to compute average bytes/record 
   org.apache.hudi.exception.HoodieIOException: Could not read commit details 
from 
s3://tasktop-data-platform-dev-analytical-data/simulator/workstreams/.hoodie/20230714152804208.commit
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:310)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.common.table.timeline.HoodieDefaultTimeline.getInstantDetails(HoodieDefaultTimeline.java:438)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.UpsertPartitioner.averageBytesPerRecord(UpsertPartitioner.java:380)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.UpsertPartitioner.assignInserts(UpsertPartitioner.java:169)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.UpsertPartitioner.(UpsertPartitioner.java:98)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpsertPartitioner(BaseSparkCommitActionExecutor.java:404)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getPartitioner(BaseSparkCommitActionExecutor.java:224)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:170)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:83)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:68)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:44)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:107)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:96)
 ~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:140) 
~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:214) 
~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:372) 
~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) 
~[org.apache.hudi_hudi-spark3.3-bundle_2.12-0.13.1.jar:0.13.1]
   at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
 ~[spark-catalyst_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
 ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
   at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
 

[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9280:
URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649724112

   
   ## CI report:
   
   * db146be5542714a978e1d6fcdbd146e2aa834931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18825)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9212:
URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649723729

   
   ## CI report:
   
   * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823)
 
   * 32766783236e3f0b5adcc973a77ff9cf782726e5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18829)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9212:
URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649706487

   
   ## CI report:
   
   * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823)
 
   * 32766783236e3f0b5adcc973a77ff9cf782726e5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Goenka updated HUDI-6589:

Priority: Critical  (was: Major)

> Upsert failing for array type if value given [null]
> ---
>
> Key: HUDI-6589
> URL: https://issues.apache.org/jira/browse/HUDI-6589
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Aditya Goenka
>Priority: Critical
> Fix For: 0.15.0
>
>
> Hudi Upserts are failing when data in a nested field is [null],
> Details in GitHub issue (see last comment) - 
> [https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6589) Upsert failing for array type if value given [null]

2023-07-25 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-6589:
---

 Summary: Upsert failing for array type if value given [null]
 Key: HUDI-6589
 URL: https://issues.apache.org/jira/browse/HUDI-6589
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Aditya Goenka
 Fix For: 0.15.0


Hudi Upserts are failing when data in a nested field is [null],

Details in GitHub issue (see last comment) - 
[https://github.com/apache/hudi/issues/9141]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649630856

   
   ## CI report:
   
   * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827)
 
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18828)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649620494

   
   ## CI report:
   
   * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827)
 
   * d062a4c9cecf2f35a2f07a046a4139c7d0aea301 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9209:
URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649620034

   
   ## CI report:
   
   * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN
   * a7f8558aaffdaab4850780224e1385c3e682372a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18824)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649609922

   
   ## CI report:
   
   * 4d363f192f951fb54799602270fb0ca16ce19d39 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18812)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18827)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9212: [HUDI-6541] Multiple writers should create new and different instant time to avoid marker conflict of same instant

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9212:
URL: https://github.com/apache/hudi/pull/9212#issuecomment-1649609463

   
   ## CI report:
   
   * f494be8b2b8d4e9d5a6d595eea8bc907602efd35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18823)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #9277: [HUDI-6558] support SQL update for no-precombine field tables

2023-07-25 Thread via GitHub


kazdy commented on PR #9277:
URL: https://github.com/apache/hudi/pull/9277#issuecomment-1649530790

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


SteNicholas commented on PR #9211:
URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649522478

   @danny0405, could you take a look at this pull request?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9211:
URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649517541

   
   ## CI report:
   
   * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN
   * 278399029bc5dc3ab81d0366b65aaed3cf019b7c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18821)
 
   * f8607c6bd9ecf09e8da2d6b372a80eff2221108d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18826)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9211:
URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649500610

   
   ## CI report:
   
   * b6afe889ca6b47f4d1d934bb552cc1c489f9d0af UNKNOWN
   * d8e39cb69480b8eb9014f09f6b84e741b9092a9f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18635)
 
   * 278399029bc5dc3ab81d0366b65aaed3cf019b7c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18821)
 
   * f8607c6bd9ecf09e8da2d6b372a80eff2221108d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy opened a new pull request, #9281: [WIP] Add HooideTable in BaseHoodieClient

2023-07-25 Thread via GitHub


Zouxxyy opened a new pull request, #9281:
URL: https://github.com/apache/hudi/pull/9281

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 commented on pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


stream2000 commented on PR #9211:
URL: https://github.com/apache/hudi/pull/9211#issuecomment-1649457125

   > even if the failed writes clean policy could be inferred from optimistic 
concurrent control is enabled, this support has no conflict with the inference.
   
   Do you mean by  even if we can infer the lazy clean config, we still won't 
add the clean operator to the pipeline so we still need this pr? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] adityaverma1997 commented on issue #9257: [SUPPORT] Parquet files got cleaned up even when cleaning operation failed hence leading to subsequent failed clustering and cleaning

2023-07-25 Thread via GitHub


adityaverma1997 commented on issue #9257:
URL: https://github.com/apache/hudi/issues/9257#issuecomment-1649440527

   Correct me if I am wrong here, though I am running async cleaning but 
cleaning frequency is controlled by the following hudi configuration, which is:
   ```
   hoodie.clean.max.commits
   ```
   which is set as 10 in my case, so cleaner will get scheduled and executed 
after every 10th commit.
   On the other hand, we can retain no of commits when cleaning is executed 
based on below configuration:
   ```
   hoodie.cleaner.commits.retained
   ``` 
   I have set it as 2, so it will retain latest 2 commits and clean remaining 
commits on every cleaning execution.
   
   Looking forward for your reply @danny0405 and @ad1happy2go 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9280:
URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649431922

   
   ## CI report:
   
   * db146be5542714a978e1d6fcdbd146e2aa834931 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18825)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9274: [MINOR] fix millis append format error

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9274:
URL: https://github.com/apache/hudi/pull/9274#issuecomment-1649431799

   
   ## CI report:
   
   * 94d9dbcb05d1505d4a1d5e82dca8a8ba946f47da Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18806)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18818)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


SteNicholas commented on code in PR #9211:
URL: https://github.com/apache/hudi/pull/9211#discussion_r1273233193


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -95,6 +95,9 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
 DataStream pipeline = Pipelines.append(conf, rowType, 
dataStream, context.isBounded());
 if (OptionsResolver.needsAsyncClustering(conf)) {
   return Pipelines.cluster(conf, rowType, pipeline);
+} else if (OptionsResolver.isLazyFailedWritesCleanPolicy(conf)) {

Review Comment:
   @stream2000, thanks for the reminder. I have modified `HoodieFlinkStreamer`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9280: [HUDI-6587] Handle hollow commit for time travel query

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9280:
URL: https://github.com/apache/hudi/pull/9280#issuecomment-1649418587

   
   ## CI report:
   
   * db146be5542714a978e1d6fcdbd146e2aa834931 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-07-25 Thread via GitHub


xushiyan commented on code in PR #9221:
URL: https://github.com/apache/hudi/pull/9221#discussion_r1273223195


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java:
##
@@ -98,8 +98,9 @@ public HiveSyncConfig(Properties props) {
 
   public HiveSyncConfig(Properties props, Configuration hadoopConf) {
 super(props, hadoopConf);
-HiveConf hiveConf = hadoopConf instanceof HiveConf
-? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class);
+HiveConf hiveConf = new HiveConf();
+// HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
+hiveConf.addResource(hadoopConf);

Review Comment:
   not so sure if this is equivalent to holding the original `hadoopConf` as 
this changes the order of addResources() during constructing. We should be good 
only if we can verify the equivalence.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-07-25 Thread via GitHub


xushiyan commented on PR #9221:
URL: https://github.com/apache/hudi/pull/9221#issuecomment-1649394492

   > Hi @xushiyan, I noticed the casting from hadoopConf to hiveConf was 
introduced by this PR from you(#6202) but I couldn't find any context. Could 
you help me learn why we made that change?
   > 
   > ```
   > HiveConf hiveConf = hadoopConf instanceof HiveConf
   > ? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class);
   > ```
   
   hey @CTTY it's probably meant for being fully compatible with the original 
code, as it was done for refactoring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6587) Handle hollow commit for time travel query

2023-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6587:
-
Labels: pull-request-available  (was: )

> Handle hollow commit for time travel query
> --
>
> Key: HUDI-6587
> URL: https://issues.apache.org/jira/browse/HUDI-6587
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan opened a new pull request, #9280: [HUDI-6587] Handle hollow commit for time travel query

2023-07-25 Thread via GitHub


xushiyan opened a new pull request, #9280:
URL: https://github.com/apache/hudi/pull/9280

   ### Change Logs
   
   Fail time-travel query when the given timestamp covers any hollow commit.
   
   ### Impact
   
   Time travel query behavior. Time travel usually won't cover hollow commit, 
which mostly exists within recent time frame.
   
   ### Risk level
   
   Low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 commented on a diff in pull request #9211: [HUDI-6540] Support failed writes clean policy for Flink

2023-07-25 Thread via GitHub


stream2000 commented on code in PR #9211:
URL: https://github.com/apache/hudi/pull/9211#discussion_r1273176064


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -95,6 +95,9 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
 DataStream pipeline = Pipelines.append(conf, rowType, 
dataStream, context.isBounded());
 if (OptionsResolver.needsAsyncClustering(conf)) {
   return Pipelines.cluster(conf, rowType, pipeline);
+} else if (OptionsResolver.isLazyFailedWritesCleanPolicy(conf)) {

Review Comment:
   Should we also modify `HoodieFlinkStreamer` here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9209:
URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649344787

   
   ## CI report:
   
   * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN
   * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822)
 
   * a7f8558aaffdaab4850780224e1385c3e682372a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18824)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ksmou commented on a diff in pull request #9229: [HUDI-6565] Spark offline compaction add failed retry mechanism

2023-07-25 Thread via GitHub


ksmou commented on code in PR #9229:
URL: https://github.com/apache/hudi/pull/9229#discussion_r1273169390


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java:
##
@@ -101,6 +104,12 @@ public static class Config implements Serializable {
 public String runningMode = null;
 @Parameter(names = {"--strategy", "-st"}, description = "Strategy Class", 
required = false)
 public String strategyClassName = 
LogFileSizeBasedCompactionStrategy.class.getName();
+@Parameter(names = {"--job-max-processing-time-ms", "-mt"}, description = 
"Take effect when using --mode/-m execute or scheduleAndExecute. "
++ "If maxProcessingTimeMs passed but compaction job is still 
unfinished, hoodie would consider this job as failed and relaunch.")
+public long maxProcessingTimeMs = 0;
+@Parameter(names = {"--retry-last-failed-compaction-job", "-rc"}, 
description = "Take effect when using --mode/-m execute or scheduleAndExecute. "

Review Comment:
   Yes, we need it to process the failed inflight compaction plan which will 
never been re-run in default way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] big-doudou commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-25 Thread via GitHub


big-doudou commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649335027

   > 
   
   What if this is an new task that has not yet had a successful checkpoint


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9209:
URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649331921

   
   ## CI report:
   
   * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN
   * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822)
 
   * a7f8558aaffdaab4850780224e1385c3e682372a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-07-25 Thread via GitHub


hudi-bot commented on PR #9209:
URL: https://github.com/apache/hudi/pull/9209#issuecomment-1649319585

   
   ## CI report:
   
   * 9889e40cdf17f6f24ddefff010a063d4dd2c58e7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18820)
 
   * 8f2dc4ec3e26f1908ae5d15f194bf70ca7dab27e UNKNOWN
   * c281ded6d554350dfe362cce496d6d72cfe0bbbe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18822)
 
   * a7f8558aaffdaab4850780224e1385c3e682372a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stream2000 commented on a diff in pull request #9199: [HUDI-6534]Support consistent hashing row writer

2023-07-25 Thread via GitHub


stream2000 commented on code in PR #9199:
URL: https://github.com/apache/hudi/pull/9199#discussion_r1273148438


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/ConsistentBucketIndexBulkInsertPartitionerWithRows.java:
##
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.execution.bulkinsert;
+
+import org.apache.hudi.common.model.ConsistentHashingNode;
+import org.apache.hudi.common.model.HoodieConsistentHashingMetadata;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.index.bucket.ConsistentBucketIdentifier;
+import org.apache.hudi.index.bucket.ConsistentBucketIndexUtils;
+import org.apache.hudi.index.bucket.HoodieSparkConsistentBucketIndex;
+import org.apache.hudi.keygen.BuiltinKeyGenerator;
+import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory;
+import org.apache.hudi.table.BulkInsertPartitioner;
+import org.apache.hudi.table.ConsistentHashingBucketInsertPartitioner;
+import org.apache.hudi.table.HoodieTable;
+
+import org.apache.spark.Partitioner;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import scala.Tuple2;
+
+/**
+ * Bulk_insert partitioner of Spark row using consistent hashing bucket index.
+ */
+public class ConsistentBucketIndexBulkInsertPartitionerWithRows
+implements BulkInsertPartitioner>, 
ConsistentHashingBucketInsertPartitioner {
+
+  private final HoodieTable table;
+
+  private final String indexKeyFields;
+
+  private final List fileIdPfxList = new ArrayList<>();
+  private final Map> hashingChildrenNodes;
+
+  private Map partitionToIdentifier;
+
+  private final Option keyGeneratorOpt;
+
+  private Map> partitionToFileIdPfxIdxMap;
+
+  private final RowRecordKeyExtractor extractor;
+
+  public ConsistentBucketIndexBulkInsertPartitionerWithRows(HoodieTable table, 
boolean populateMetaFields) {
+this.indexKeyFields = table.getConfig().getBucketIndexHashField();
+this.table = table;
+this.hashingChildrenNodes = new HashMap<>();
+if (!populateMetaFields) {
+  this.keyGeneratorOpt = 
HoodieSparkKeyGeneratorFactory.getKeyGenerator(table.getConfig().getProps());
+} else {
+  this.keyGeneratorOpt = Option.empty();
+}
+this.extractor = 
RowRecordKeyExtractor.getRowRecordKeyExtractor(populateMetaFields, 
keyGeneratorOpt);
+
ValidationUtils.checkArgument(table.getMetaClient().getTableType().equals(HoodieTableType.MERGE_ON_READ),

Review Comment:
   Yes, we do dual write during consistent hashing bucket index resizing but 
CoW table do not support writing logs. And It's a little bit hard to move it to 
parent class since the closest parent class for these two 
ConsistentHashingPartitioner is `BulkInsertPartitioner`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on issue #8892: [SUPPORT] [BUG] Duplicate fileID ??? from bucket ?? of partition found during the BucketStreamWriteFunction index bootstrap.

2023-07-25 Thread via GitHub


voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649313876

   Yeap, we ensured that has happened. In our internal version a rollback will 
be performed to remove all the files that was written before checkpoint.
   
   Afterwhich, a write will be performed again from the last successful 
checkpoint.
   
   I'll do a check on this again on the community's master version later in the 
week. Sorry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery

2023-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6588:
-
Labels: pull-request-available  (was: )

> Fix duplicate fileId on TM partial-failover and recovery
> 
>
> Key: HUDI-6588
> URL: https://issues.apache.org/jira/browse/HUDI-6588
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] weimingdiit commented on a diff in pull request #9252: [HUDI-6500] Fix bug when Using the RuntimeReplaceable function in the…

2023-07-25 Thread via GitHub


weimingdiit commented on code in PR #9252:
URL: https://github.com/apache/hudi/pull/9252#discussion_r1273127379


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:
##
@@ -391,63 +392,65 @@ case class ResolveImplementationsEarly() extends 
Rule[LogicalPlan] {
 case class ResolveImplementations() extends Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = {
-plan match {
-  // Convert to MergeIntoHoodieTableCommand
-  case mit@MatchMergeIntoTable(target@ResolvesToHudiTable(_), _, _) if 
mit.resolved =>
-MergeIntoHoodieTableCommand(mit.asInstanceOf[MergeIntoTable])
-
-  // Convert to UpdateHoodieTableCommand
-  case ut@UpdateTable(plan@ResolvesToHudiTable(_), _, _) if ut.resolved =>
-UpdateHoodieTableCommand(ut)
-
-  // Convert to DeleteHoodieTableCommand
-  case dft@DeleteFromTable(plan@ResolvesToHudiTable(_), _) if dft.resolved 
=>
-DeleteHoodieTableCommand(dft)
-
-  // Convert to CompactionHoodieTableCommand
-  case ct @ CompactionTable(plan @ ResolvesToHudiTable(table), operation, 
options) if ct.resolved =>
-CompactionHoodieTableCommand(table, operation, options)
-
-  // Convert to CompactionHoodiePathCommand
-  case cp @ CompactionPath(path, operation, options) if cp.resolved =>
-CompactionHoodiePathCommand(path, operation, options)
-
-  // Convert to CompactionShowOnTable
-  case csot @ CompactionShowOnTable(plan @ ResolvesToHudiTable(table), 
limit) if csot.resolved =>
-CompactionShowHoodieTableCommand(table, limit)
-
-  // Convert to CompactionShowHoodiePathCommand
-  case csop @ CompactionShowOnPath(path, limit) if csop.resolved =>
-CompactionShowHoodiePathCommand(path, limit)
-
-  // Convert to HoodieCallProcedureCommand
-  case c @ CallCommand(_, _) =>
-val procedure: Option[Procedure] = loadProcedure(c.name)
-val input = buildProcedureArgs(c.args)
-if (procedure.nonEmpty) {
-  CallProcedureHoodieCommand(procedure.get, input)
-} else {
-  c
-}
-
-  // Convert to CreateIndexCommand
-  case ci @ CreateIndex(plan @ ResolvesToHudiTable(table), indexName, 
indexType, ignoreIfExists, columns, options, output) =>
-// TODO need to resolve columns
-CreateIndexCommand(table, indexName, indexType, ignoreIfExists, 
columns, options, output)
-
-  // Convert to DropIndexCommand
-  case di @ DropIndex(plan @ ResolvesToHudiTable(table), indexName, 
ignoreIfNotExists, output) if di.resolved =>
-DropIndexCommand(table, indexName, ignoreIfNotExists, output)
-
-  // Convert to ShowIndexesCommand
-  case si @ ShowIndexes(plan @ ResolvesToHudiTable(table), output) if 
si.resolved =>
-ShowIndexesCommand(table, output)
-
-  // Covert to RefreshCommand
-  case ri @ RefreshIndex(plan @ ResolvesToHudiTable(table), indexName, 
output) if ri.resolved =>
-RefreshIndexCommand(table, indexName, output)
-
-  case _ => plan
+AnalysisHelper.allowInvokingTransformsInAnalyzer {
+  plan match {
+// Convert to MergeIntoHoodieTableCommand

Review Comment:
   > And can you also check the test failures
   
   Ok, In my local env, ut can pass, I will take a closer look at the problem 
of ut failure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery

2023-07-25 Thread Danny Chen (Jira)
Danny Chen created HUDI-6588:


 Summary: Fix duplicate fileId on TM partial-failover and recovery
 Key: HUDI-6588
 URL: https://issues.apache.org/jira/browse/HUDI-6588
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Reporter: Danny Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6588) Fix duplicate fileId on TM partial-failover and recovery

2023-07-25 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6588:
-
Fix Version/s: 0.14.0

> Fix duplicate fileId on TM partial-failover and recovery
> 
>
> Key: HUDI-6588
> URL: https://issues.apache.org/jira/browse/HUDI-6588
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >