[GitHub] [hudi] xiarixiaoyao commented on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-939236773


   @danny0405 addressed all comments, could you pls review again.   @nsivabalan 
 thanks for your attention to this pr。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 0f9310f37ab73800eada15e7d4cfa6d81cb3f768 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2563)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] test-wangxiaoyu commented on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-08 Thread GitBox


test-wangxiaoyu commented on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939231215


   **To synchronize kerberos-managed Hive metadata, the Flink cluster must 
perform Kerberos authentication first and then use my new parameters**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 6eebb1a711e3655061d471c2a96b54e205dac630 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2562)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 42ea9882efc540dfe36610a5b343b672b1eeaee8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2542)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2557)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2561)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * d09ac1c568c06d5c06551e44713e5ecbec5ce5a7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2559)
 
   * 0f9310f37ab73800eada15e7d4cfa6d81cb3f768 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2563)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * c14bded2e903e9c55d86fddc77d40effccca5e01 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2470)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2560)
 
   * 6eebb1a711e3655061d471c2a96b54e205dac630 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2562)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * d09ac1c568c06d5c06551e44713e5ecbec5ce5a7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2559)
 
   * 0f9310f37ab73800eada15e7d4cfa6d81cb3f768 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2558)
 
   * d09ac1c568c06d5c06551e44713e5ecbec5ce5a7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2559)
 
   * 0f9310f37ab73800eada15e7d4cfa6d81cb3f768 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2537) Fix metadata table for flink

2021-10-08 Thread Danny Chen (Jira)
Danny Chen created HUDI-2537:


 Summary: Fix metadata table for flink
 Key: HUDI-2537
 URL: https://issues.apache.org/jira/browse/HUDI-2537
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0


Following the work after: 
https://github.com/apache/hudi/commit/5f32162a2fad0cd6db87972d29336dc09599bf8a



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 42ea9882efc540dfe36610a5b343b672b1eeaee8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2542)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2557)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2561)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


YannByron commented on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-939220103


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * c14bded2e903e9c55d86fddc77d40effccca5e01 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2470)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2560)
 
   * 6eebb1a711e3655061d471c2a96b54e205dac630 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-939217992


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2558)
 
   * d09ac1c568c06d5c06551e44713e5ecbec5ce5a7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2559)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2558)
 
   * d09ac1c568c06d5c06551e44713e5ecbec5ce5a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2558)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * c14bded2e903e9c55d86fddc77d40effccca5e01 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2470)
 
   * 6eebb1a711e3655061d471c2a96b54e205dac630 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725422730



##
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##
@@ -336,6 +336,11 @@ public static String getFileExtensionFromLog(Path logPath) 
{
 return matcher.group(3);
   }
 
+  public static String getLogFileExtension(String fullName) {
+Matcher matcher = LOG_FILE_PATTERN.matcher(fullName);

Review comment:
   ok, change it. and remove getLogFileExtension




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on pull request #3691: [HUDI-2455] Adding spark_avro dependency to hudi-integ-test

2021-10-08 Thread GitBox


YannByron commented on pull request #3691:
URL: https://github.com/apache/hudi/pull/3691#issuecomment-939213789


   @nsivabalan I upgrade maven version to 3.8.x, and make it works. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419860



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##
@@ -445,10 +457,21 @@ public static HoodieMetadataConfig 
buildMetadataConfig(Configuration conf) {
 HoodieTableFileSystemView fsView = 
fsViewCache.computeIfAbsent(metaClient, tableMetaClient ->
 
FileSystemViewManager.createInMemoryFileSystemViewWithTimeline(engineContext, 
tableMetaClient, buildMetadataConfig(job), timeline));
 List filteredBaseFiles = new ArrayList<>();
+Map> filteredLogs = new HashMap<>();
 for (Path p : entry.getValue()) {

Review comment:
   we need map here(Map> filteredLogs), since when 
we build ReatimeFilesStatus we need the fileStatus of current log.   of course 
we can use List>, but there is no need.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 42ea9882efc540dfe36610a5b343b672b1eeaee8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2542)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2557)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725420475



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -99,7 +101,32 @@
 }
   }
   Option finalHoodieVirtualKeyInfo = 
hoodieVirtualKeyInfo;
-  partitionsToParquetSplits.keySet().forEach(partitionPath -> {
+  // deal with incremental query
+  candidateFileSplits.stream().forEach(s -> {
+try {

Review comment:
   sorry, i cannot get your point。could you explain it in detail, thanks。




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419860



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##
@@ -445,10 +457,21 @@ public static HoodieMetadataConfig 
buildMetadataConfig(Configuration conf) {
 HoodieTableFileSystemView fsView = 
fsViewCache.computeIfAbsent(metaClient, tableMetaClient ->
 
FileSystemViewManager.createInMemoryFileSystemViewWithTimeline(engineContext, 
tableMetaClient, buildMetadataConfig(job), timeline));
 List filteredBaseFiles = new ArrayList<>();
+Map> filteredLogs = new HashMap<>();
 for (Path p : entry.getValue()) {

Review comment:
   ok, will change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419790



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -66,6 +91,139 @@
 return HoodieRealtimeInputFormatUtils.getRealtimeSplits(job, fileSplits);
   }
 
+  /**
+   * keep the logical of mor_incr_view as same as spark datasource.
+   * to do: unify the incremental view code between hive/spark-sql and spark 
datasource
+   */
+  @Override
+  protected List listStatusForIncrementalMode(
+  JobConf job, HoodieTableMetaClient tableMetaClient, List 
inputPaths) throws IOException {
+List result = new ArrayList<>();
+String tableName = tableMetaClient.getTableConfig().getTableName();
+Job jobContext = Job.getInstance(job);
+
+Option timeline = 
HoodieInputFormatUtils.getFilteredCommitsTimeline(jobContext, tableMetaClient);
+if (!timeline.isPresent()) {
+  return result;
+}
+String lastIncrementalTs = HoodieHiveUtils.readStartCommitTime(jobContext, 
tableName);
+// Total number of commits to return in this batch. Set this to -1 to get 
all the commits.
+Integer maxCommits = HoodieHiveUtils.readMaxCommits(jobContext, tableName);
+HoodieTimeline commitsTimelineToReturn = 
timeline.get().findInstantsAfter(lastIncrementalTs, maxCommits);
+Option> commitsToCheck = 
Option.of(commitsTimelineToReturn.getInstants().collect(Collectors.toList()));
+if (!commitsToCheck.isPresent()) {
+  return result;
+}
+Map> partitionsWithFileStatus  = 
HoodieInputFormatUtils
+.listAffectedFilesForCommits(new Path(tableMetaClient.getBasePath()), 
commitsToCheck.get(), commitsTimelineToReturn);
+// build fileGroup from fsView
+List affectedFileStatus = new ArrayList<>();
+partitionsWithFileStatus.forEach((key, value) -> value.forEach((k, v) -> 
affectedFileStatus.add(v)));
+HoodieTableFileSystemView fsView = new 
HoodieTableFileSystemView(tableMetaClient, commitsTimelineToReturn, 
affectedFileStatus.toArray(new FileStatus[0]));
+// build fileGroup from fsView
+String basePath = tableMetaClient.getBasePath();
+// filter affectedPartition by inputPaths
+List affectedPartition = partitionsWithFileStatus.keySet().stream()
+.filter(k -> k.isEmpty() ? inputPaths.contains(new Path(basePath)) : 
inputPaths.contains(new Path(basePath, k))).collect(Collectors.toList());
+if (affectedPartition.isEmpty()) {
+  return result;
+}
+List fileGroups = affectedPartition.stream()
+.flatMap(partitionPath -> 
fsView.getAllFileGroups(partitionPath)).collect(Collectors.toList());
+setInputPaths(job, affectedPartition.stream()
+.map(p -> p.isEmpty() ? basePath : new Path(basePath, 
p).toUri().toString()).collect(Collectors.joining(",")));
+
+// find all file status in current partitionPath
+FileStatus[] fileStatuses = getStatus(job);
+Map candidateFileStatus = new HashMap<>();
+for (int i = 0; i < fileStatuses.length; i++) {
+  String key = fileStatuses[i].getPath().toString();
+  candidateFileStatus.put(key, fileStatuses[i]);
+}
+
+String maxCommitTime = fsView.getLastInstant().get().getTimestamp();
+fileGroups.stream().forEach(f -> {
+  try {
+List baseFiles = f.getAllFileSlices().filter(slice -> 
slice.getBaseFile().isPresent()).collect(Collectors.toList());
+if (!baseFiles.isEmpty()) {
+  FileStatus baseFileStatus = 
HoodieInputFormatUtils.getFileStatus(baseFiles.get(0).getBaseFile().get());
+  String baseFilePath = baseFileStatus.getPath().toUri().toString();
+  if (!candidateFileStatus.containsKey(baseFilePath)) {
+throw new HoodieException("Error obtaining fileStatus for file: " 
+ baseFilePath);
+  }
+  RealtimeFileStatus fileStatus = new 
RealtimeFileStatus(candidateFileStatus.get(baseFilePath));
+  fileStatus.setMaxCommitTime(maxCommitTime);
+  fileStatus.setBelongToIncrementalFileStatus(true);
+  fileStatus.setBasePath(basePath);
+  fileStatus.setBaseFilePath(baseFilePath);
+  
fileStatus.setDeltaLogPaths(f.getLatestFileSlice().get().getLogFiles().map(l -> 
l.getPath().toString()).collect(Collectors.toList()));
+  // try to set bootstrapfileStatus
+  if (baseFileStatus instanceof LocatedFileStatusWithBootstrapBaseFile 
|| baseFileStatus instanceof FileStatusWithBootstrapBaseFile) {
+fileStatus.setBootStrapFileStatus(baseFileStatus);
+  }
+  result.add(fileStatus);
+}
+// add file group which has only logs.
+if (f.getLatestFileSlice().isPresent() && baseFiles.isEmpty()) {
+  List logFileStatus = 
f.getLatestFileSlice().get().getLogFiles().map(logFile -> 
logFile.getFileStatus()).collect(Collectors.toList());
+  if (logFileStatus.size() > 0) {
+RealtimeF

[GitHub] [hudi] hudi-bot edited a comment on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2558)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419147



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import java.io.IOException;
+import java.text.MessageFormat;
+import java.util.Iterator;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Record Reader implementation to read avro data, to support inc queries.
+ */
+public class HoodieMergedLogReader extends AbstractRealtimeRecordReader
+implements RecordReader {
+  private static final Logger LOG = 
LogManager.getLogger(AbstractRealtimeRecordReader.class);
+  private final HoodieMergedLogRecordScanner logRecordScanner;
+  private final Iterator> 
logRecordsKeyIterator;
+  private ArrayWritable valueObj;

Review comment:
   agree




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419137



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import java.io.IOException;
+import java.text.MessageFormat;
+import java.util.Iterator;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Record Reader implementation to read avro data, to support inc queries.
+ */
+public class HoodieMergedLogReader extends AbstractRealtimeRecordReader
+implements RecordReader {
+  private static final Logger LOG = 
LogManager.getLogger(AbstractRealtimeRecordReader.class);
+  private final HoodieMergedLogRecordScanner logRecordScanner;
+  private final Iterator> 
logRecordsKeyIterator;
+  private ArrayWritable valueObj;
+
+  private int end;
+  private int offset;
+
+  public HoodieMergedLogReader(RealtimeSplit split, JobConf job, 
HoodieMergedLogRecordScanner logRecordScanner) {
+super(split, job);
+this.logRecordScanner = logRecordScanner;
+this.end = logRecordScanner.getRecords().size();
+this.logRecordsKeyIterator = logRecordScanner.iterator();
+this.valueObj = new ArrayWritable(Writable.class, new 
Writable[getHiveSchema().getFields().size()]);
+  }
+
+  private Option buildGenericRecord(HoodieRecord record) throws IOException {
+if (usesCustomPayload) {
+  return record.getData().getInsertValue(getWriterSchema());
+} else {
+  return record.getData().getInsertValue(getReaderSchema());
+}
+  }
+
+  @Override
+  public boolean next(NullWritable key, ArrayWritable arrayWritable) throws 
IOException {
+if (!logRecordsKeyIterator.hasNext()) {
+  return false;
+}
+Option rec;
+HoodieRecord currentRecord = logRecordsKeyIterator.next();
+
+rec = buildGenericRecord(currentRecord);
+// try to skip delete record
+while (!rec.isPresent() && logRecordsKeyIterator.hasNext()) {
+  offset++;
+  rec = buildGenericRecord(logRecordsKeyIterator.next());
+}
+if (!rec.isPresent()) {
+  return false;
+}
+
+GenericRecord recordToReturn = rec.get();
+if (usesCustomPayload) {
+  // If using a custom payload, return only the projection fields. The 
readerSchema is a schema derived from
+  // the writerSchema with only the projection fields
+  recordToReturn = HoodieAvroUtils.rewriteRecord(rec.get(), 
getReaderSchema());
+}
+ArrayWritable curWritable = (ArrayWritable) 
HoodieRealtimeRecordReaderUtils.avroToArrayWritable(recordToReturn, 
getHiveSchema());
+
+if (arrayWritable != curWritable) {
+  final Writable[] arrValue = arrayWritable.get();

Review comment:
   agree, will remove redundant judgment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725419094



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import java.io.IOException;
+import java.text.MessageFormat;
+import java.util.Iterator;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Record Reader implementation to read avro data, to support inc queries.
+ */
+public class HoodieMergedLogReader extends AbstractRealtimeRecordReader
+implements RecordReader {
+  private static final Logger LOG = 
LogManager.getLogger(AbstractRealtimeRecordReader.class);
+  private final HoodieMergedLogRecordScanner logRecordScanner;
+  private final Iterator> 
logRecordsKeyIterator;
+  private ArrayWritable valueObj;
+
+  private int end;
+  private int offset;
+
+  public HoodieMergedLogReader(RealtimeSplit split, JobConf job, 
HoodieMergedLogRecordScanner logRecordScanner) {
+super(split, job);
+this.logRecordScanner = logRecordScanner;
+this.end = logRecordScanner.getRecords().size();
+this.logRecordsKeyIterator = logRecordScanner.iterator();
+this.valueObj = new ArrayWritable(Writable.class, new 
Writable[getHiveSchema().getFields().size()]);
+  }
+
+  private Option buildGenericRecord(HoodieRecord record) throws IOException {
+if (usesCustomPayload) {
+  return record.getData().getInsertValue(getWriterSchema());
+} else {
+  return record.getData().getInsertValue(getReaderSchema());
+}
+  }
+
+  @Override
+  public boolean next(NullWritable key, ArrayWritable arrayWritable) throws 
IOException {
+if (!logRecordsKeyIterator.hasNext()) {
+  return false;
+}
+Option rec;
+HoodieRecord currentRecord = logRecordsKeyIterator.next();
+
+rec = buildGenericRecord(currentRecord);
+// try to skip delete record
+while (!rec.isPresent() && logRecordsKeyIterator.hasNext()) {
+  offset++;
+  rec = buildGenericRecord(logRecordsKeyIterator.next());
+}
+if (!rec.isPresent()) {
+  return false;
+}
+
+GenericRecord recordToReturn = rec.get();
+if (usesCustomPayload) {
+  // If using a custom payload, return only the projection fields. The 
readerSchema is a schema derived from

Review comment:
   yes, keep the logical continuity with HoodieMergedLogRecordScanner。 when 
we build HoodieMergedLogRecordScanner we use usesCustomPayload to deternime the 
read schema for log file, see follow code from RealTimeCompactedRecordReader。
   _return HoodieMergedLogRecordScanner.newBuilder()
   .withFileSystem(FSUtils.getFs(split.getPath().toString(), jobConf))
   .withBasePath(split.getBasePath())
   .withLogFilePaths(split.getDeltaLogPaths())
   .withReaderSchema(**usesCustomPayload ? getWriterSchema() : 
getReaderSchema()**)_
   
   of course, Due to the schema evolution ability of avro, use 
getWriterSchema() or getReaderSchema() is not different。




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot commented on pull request #3772:
URL: https://github.com/apache/hudi/pull/3772#issuecomment-939209312


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 opened a new pull request #3772: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


danny0405 opened a new pull request #3772:
URL: https://github.com/apache/hudi/pull/3772


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


danny0405 closed pull request #3767:
URL: https://github.com/apache/hudi/pull/3767


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725418164



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import java.io.IOException;
+import java.text.MessageFormat;
+import java.util.Iterator;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+/**
+ * Record Reader implementation to read avro data, to support inc queries.
+ */
+public class HoodieMergedLogReader extends AbstractRealtimeRecordReader
+implements RecordReader {
+  private static final Logger LOG = 
LogManager.getLogger(AbstractRealtimeRecordReader.class);
+  private final HoodieMergedLogRecordScanner logRecordScanner;
+  private final Iterator> 
logRecordsKeyIterator;
+  private ArrayWritable valueObj;
+
+  private int end;
+  private int offset;
+
+  public HoodieMergedLogReader(RealtimeSplit split, JobConf job, 
HoodieMergedLogRecordScanner logRecordScanner) {
+super(split, job);
+this.logRecordScanner = logRecordScanner;
+this.end = logRecordScanner.getRecords().size();
+this.logRecordsKeyIterator = logRecordScanner.iterator();
+this.valueObj = new ArrayWritable(Writable.class, new 
Writable[getHiveSchema().getFields().size()]);
+  }
+
+  private Option buildGenericRecord(HoodieRecord record) throws IOException {
+if (usesCustomPayload) {
+  return record.getData().getInsertValue(getWriterSchema());
+} else {
+  return record.getData().getInsertValue(getReaderSchema());
+}
+  }
+
+  @Override
+  public boolean next(NullWritable key, ArrayWritable arrayWritable) throws 
IOException {
+if (!logRecordsKeyIterator.hasNext()) {

Review comment:
   agree




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

2021-10-08 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725418146



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergedLogReader.java
##
@@ -0,0 +1,144 @@
+/*

Review comment:
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2556)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 42ea9882efc540dfe36610a5b343b672b1eeaee8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2542)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2557)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron removed a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


YannByron removed a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-938550725


   @hudi-bot rerun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-08 Thread GitBox


YannByron commented on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-939206082


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-08 Thread GitBox


yanghua commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r725413346



##
File path: hudi-common/src/main/java/org/apache/hudi/common/data/HoodieData.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.data;
+
+import org.apache.hudi.common.function.SerializableFunction;
+
+import java.io.Serializable;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * An abstraction for a data collection of objects in type T to store the 
reference
+ * and do transformation.
+ *
+ * @param  type of object.
+ */
+public abstract class HoodieData implements Serializable {

Review comment:
   Glad to see that we toward a simpler direction. HoodieData contains some 
operation APIs. WDYT about those similar operation APIs exists in 
`HoodieEngineContext`? Will we remove them and make  `HoodieEngineContext` to 
be a pure context object?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2556)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-08 Thread GitBox


hudi-bot commented on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] test-wangxiaoyu opened a new pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-08 Thread GitBox


test-wangxiaoyu opened a new pull request #3771:
URL: https://github.com/apache/hudi/pull/3771


   **Previously hudI did not support synchronizing Kerberos-managed Hive. This 
time, I added two parameters to hudI sync Hive Meta to support synchronizing 
Kerberos-managed Hive,Supports JDBC and HMS modes**
   
   **The new parameters**:
   1)  hive_sync.use_kerberos(default false,true: Enables the 
Kerberos function of Hive Sync)
   2) hive_sync.kerberos_principal  (Connected to hive Principal, 
note!Peincipal is not the Principal used by Hive)
   
   **Flink sample**:
   
   CREATE TABLE t1(
 uuid VARCHAR(20),
 name VARCHAR(10),
 age INT,
 ts TIMESTAMP(3),
 `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   with(
'connector' = 'hudi',
'hive_sync.enable'='true',
'hive_sync.db'='test',
'hive_sync.table'='t1',
'hive_sync.mode'='hms',
'path' = 'hdfs://ip:8020/warehouse/hudi/t1',
'hive_sync.metastore.uris'='thrift://ip:9083',
'hive_sync.use_kerberos' = 'true',
'hive_sync.kerberos_principal' = 'hive/_HOST@BIGDATA'
   )
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 48f46f8ec7bd5fe08badae3e6c201a6009bfa2de Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2555)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2532) Set right default value for max delta commits for compaction in metadata table

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2532:
-

Assignee: Manoj Govindassamy

> Set right default value for max delta commits for compaction in metadata 
> table 
> ---
>
> Key: HUDI-2532
> URL: https://issues.apache.org/jira/browse/HUDI-2532
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 0.10.0
>
>
> Set right default value of 10 for max delta commits for compaction in 
> metadata table. As of now, its set as 24 which is huge. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 8e536bed0eb450dac9cd49e2e9e9ae24dd16b73e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2554)
 
   * 48f46f8ec7bd5fe08badae3e6c201a6009bfa2de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2555)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 8e536bed0eb450dac9cd49e2e9e9ae24dd16b73e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2554)
 
   * 48f46f8ec7bd5fe08badae3e6c201a6009bfa2de UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 8e536bed0eb450dac9cd49e2e9e9ae24dd16b73e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2554)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 6d2736aac89a72f32f518faf94702a12574b7f19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2552)
 
   * 8e536bed0eb450dac9cd49e2e9e9ae24dd16b73e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2554)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 6d2736aac89a72f32f518faf94702a12574b7f19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2552)
 
   * 8e536bed0eb450dac9cd49e2e9e9ae24dd16b73e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3740: [HUDI-2496] Insert duplicate keys when precombined is deactivated

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3740:
URL: https://github.com/apache/hudi/pull/3740#issuecomment-931381693


   
   ## CI report:
   
   * e4b4c092dc5e911ba265e6386d736faa932e5c7c UNKNOWN
   * 849c3476ecd00486984052ce2b33c25924532add UNKNOWN
   * 6fcad17b918536cc5bfeb71a95f719bd71cae664 UNKNOWN
   * bd32e53983967a9c261c7def3588118625f5130a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2553)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3740: [HUDI-2496] Insert duplicate keys when precombined is deactivated

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3740:
URL: https://github.com/apache/hudi/pull/3740#issuecomment-931381693


   
   ## CI report:
   
   * e4b4c092dc5e911ba265e6386d736faa932e5c7c UNKNOWN
   * 849c3476ecd00486984052ce2b33c25924532add UNKNOWN
   * 6fcad17b918536cc5bfeb71a95f719bd71cae664 UNKNOWN
   * 159f85af4cc63de1199636ad69746789c607210a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2541)
 
   * bd32e53983967a9c261c7def3588118625f5130a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2553)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3740: [HUDI-2496] Insert duplicate keys when precombined is deactivated

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3740:
URL: https://github.com/apache/hudi/pull/3740#issuecomment-931381693


   
   ## CI report:
   
   * e4b4c092dc5e911ba265e6386d736faa932e5c7c UNKNOWN
   * 849c3476ecd00486984052ce2b33c25924532add UNKNOWN
   * 6fcad17b918536cc5bfeb71a95f719bd71cae664 UNKNOWN
   * 159f85af4cc63de1199636ad69746789c607210a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2541)
 
   * bd32e53983967a9c261c7def3588118625f5130a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] helanto commented on a change in pull request #3740: [HUDI-2496] Insert duplicate keys when precombined is deactivated

2021-10-08 Thread GitBox


helanto commented on a change in pull request #3740:
URL: https://github.com/apache/hudi/pull/3740#discussion_r725275190



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -257,6 +257,21 @@ private boolean writeUpdateRecord(HoodieRecord 
hoodieRecord, Option hoodieRecord) throws 
IOException {

Review comment:
   I noticed that to be honest. I decided to go with `boolean` just to keep 
it consistent with methods `writeUpdateRecord` and `writeRecord`. I will change!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed pull request #3742: [HUDI-2512][WIP] Fixing adding extra metadata for multi-writer

2021-10-08 Thread GitBox


nsivabalan closed pull request #3742:
URL: https://github.com/apache/hudi/pull/3742


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3742: [HUDI-2512][WIP] Fixing adding extra metadata for multi-writer

2021-10-08 Thread GitBox


nsivabalan commented on pull request #3742:
URL: https://github.com/apache/hudi/pull/3742#issuecomment-939078694


   Closing this as its not required. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codejoyan opened a new issue #3770: [SUPPORT] Clustering on a metadata enabled table

2021-10-08 Thread GitBox


codejoyan opened a new issue #3770:
URL: https://github.com/apache/hudi/issues/3770


   Spark 3.1.2
   Hudi 0.9.0
   Looks like I have hit a bug. Listing down the steps for context. 
   
   1. Bootstrapped a COW table using bulk_insert and these properties
   option(HoodieMetadataConfig.METADATA_ENABLE_PROP, "true").
   option(HoodieMetadataConfig.METADATA_METRICS_ENABLE_PROP, "true")
   
   2. Run async clustering on all the partitions
   
   3. Overwrite a partition by using bulk insert with metadata enable and 
metadata metrics enable as true.
   The step#3 failed with the below error and looks like the metadata table is 
corrupted. And now any other commit fails and I am stuck.
   ```
   21/09/29 12:34:06 INFO org.apache.hudi.metadata.HoodieBackedTableMetadata: 
Opened metadata log files from 
[gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.1_0-131-119,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.2_0-141-128,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.3_0-151-137,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.4_0-163-148,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.5_0-173-157,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210
 928133730001.log.6_0-183-166, 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.7_0-193-175,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.8_0-203-184,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.9_0-213-193,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.10_0-223-202,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.11_0-233-211,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.12_0-243-220,
 gs://udp-hudi-storage3/store_visit_scan_bootst
 
rap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.13_0-253-229,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.14_0-263-238,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.15_0-273-247,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.16_0-283-256,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.17_0-293-265,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.18_0-303-274,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_2
 0210928133730001.log.19_0-313-283, 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.20_0-323-292,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.21_0-333-301,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.22_0-343-310,
 
gs://udp-hudi-storage3/store_visit_scan_bootstrap_hudi/.hoodie/metadata/files/.3c7ae5cf-0a6b-446b-92bb-c341049fb9fd-0_20210928133730001.log.23_0-353-319]
 at instant (dataset instant=20210929122502, metadata instant=20210929093134) 
in 2627 ms
   21/09/29 12:34:06 INFO org.apache.hudi.metadata.HoodieMetadataMetrics: 
Updating metadata metrics (scan.totalDuration=2762ms, scan.count=1)
   21/09/29 12:34:06 INFO org.apache.hudi.metadata.HoodieMetadataMetrics: 
Updating metadata metrics (basefile_read.totalDuration=8ms, 
basefile_read.count=1)
   21/09/29 12:34:06 INFO org.apache.hudi.metadata.HoodieBackedTableMetadata: 
Metadata read for key op_cmpny_cd=WMT-US/visit_date=2021-01-04 took 
[baseFileRead, logMerge] [8, 0] ms
   21/09/29 12:34:06 INFO org.

[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 6d2736aac89a72f32f518faf94702a12574b7f19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2552)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3740: [HUDI-2496] Insert duplicate keys when precombined is deactivated

2021-10-08 Thread GitBox


nsivabalan commented on a change in pull request #3740:
URL: https://github.com/apache/hudi/pull/3740#discussion_r725212936



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java
##
@@ -710,6 +710,57 @@ private void testHoodieConcatHandle(HoodieWriteConfig 
config, boolean isPrepped)
 2, false, config.populateMetaFields());
   }
 
+  /**
+   * Test Insert API for HoodieConcatHandle when incoming entries contain 
duplicate keys.
+   */
+  @ParameterizedTest
+  @MethodSource("populateMetaFieldsParams")

Review comment:
   we don't need to test for case where populateMetaFieldsParams = false. 
we are trying to keep a check on our test run time. Can we please remove the 
parametrized for the new tests that are being added. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -257,6 +257,21 @@ private boolean writeUpdateRecord(HoodieRecord 
hoodieRecord, Option hoodieRecord) throws 
IOException {

Review comment:
   sorry. last few nits. 
   we don't really need to return a boolean here. 
   ```
   if (insertRecord.isPresent() && insertRecord.get().equals(IGNORE_RECORD)) {
 return;
   }
   ```
   both callers are not using the return value. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 6d2736aac89a72f32f518faf94702a12574b7f19 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2552)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3769: [HUDI-2005][WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


hudi-bot commented on pull request #3769:
URL: https://github.com/apache/hudi/pull/3769#issuecomment-939013386


   
   ## CI report:
   
   * 6d2736aac89a72f32f518faf94702a12574b7f19 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #3769: [WIP] Fixing partition path creation in AbstractTableFileSystemView

2021-10-08 Thread GitBox


nsivabalan opened a new pull request #3769:
URL: https://github.com/apache/hudi/pull/3769


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-08 Thread GitBox


nsivabalan commented on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-938830452


   @danny0405 @leesf @yanghua : Can someone review for flink engine. There were 
some divergence among flink and spark wrt compaction. So, we want to ensure 
things are good for flink with this refactoring. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-08 Thread GitBox


nsivabalan commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r725160600



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -18,39 +18,258 @@
 
 package org.apache.hudi.table.action.compact;
 
+import org.apache.hudi.avro.model.HoodieCompactionOperation;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
+import org.apache.hudi.client.AbstractHoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieAccumulator;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.engine.TaskContextSupplier;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.CompactionOperation;
+import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieFileGroupId;
+import org.apache.hudi.common.model.HoodieLogFile;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.TableFileSystemView.SliceView;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.CompactionUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.IOUtils;
+import org.apache.hudi.table.HoodieCopyOnWriteTableOperation;
 import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.compact.strategy.CompactionStrategy;
+
+import org.apache.avro.Schema;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
 
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
 import java.util.Set;
+import java.util.stream.StreamSupport;
+
+import static java.util.stream.Collectors.toList;
 
 /**
  * A HoodieCompactor runs compaction on a hoodie table.
  */
-public interface HoodieCompactor 
extends Serializable {
+public abstract class HoodieCompactor 
implements Serializable {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieCompactor.class);
+
+  public abstract Schema getReaderSchema(HoodieWriteConfig config);
+
+  public abstract void updateReaderSchema(HoodieWriteConfig config, 
HoodieTableMetaClient metaClient);
+
+  public abstract void checkCompactionTimeline(

Review comment:
   oh, I see. got it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2021-10-08 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-2531:


Assignee: Raymond Xu

> [UMBRELLA] Support Dataset APIs in writer paths
> ---
>
> Key: HUDI-2531
> URL: https://issues.apache.org/jira/browse/HUDI-2531
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>  Labels: hudi-umbrellas, sev:critical, user-support-issues
>
> To make use of Dataset APIs in writer paths instead of RDD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (10e3a9a -> a818020)

2021-10-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 10e3a9a  [MINOR] Fix typo,'properites' corrected to 'properties' 
(#3738)
 add a818020  [HUDI-2530] Adding async compaction support to integ test 
suite framework (#3750)

No new revisions were added by this update.

Summary of changes:
 .../{test.properties => compact-test.properties}   |   9 +-
 ...complex-dag-mor.yaml => mor-async-compact.yaml} | 108 ++---
 docker/demo/config/test-suite/test.properties  |   4 -
 .../integ/testsuite/HoodieTestSuiteWriter.java |  20 +++-
 .../integ/testsuite/dag/nodes/CompactNode.java |   7 +-
 .../testsuite/dag/nodes/ScheduleCompactNode.java   |   1 -
 6 files changed, 99 insertions(+), 50 deletions(-)
 copy docker/demo/config/test-suite/{test.properties => 
compact-test.properties} (89%)
 copy docker/demo/config/test-suite/{complex-dag-mor.yaml => 
mor-async-compact.yaml} (53%)


[GitHub] [hudi] nsivabalan merged pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-08 Thread GitBox


nsivabalan merged pull request #3750:
URL: https://github.com/apache/hudi/pull/3750


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2530:
--
Status: Patch Available  (was: In Progress)

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2530:
--
Status: In Progress  (was: Open)

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 99d1ec58c8fcb418986ccba294969aea414c6abb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2550)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2536) Rename compaction config keys

2021-10-08 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-2536:
-

 Summary: Rename compaction config keys 
 Key: HUDI-2536
 URL: https://issues.apache.org/jira/browse/HUDI-2536
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
Assignee: Sagar Sumit


There are certain compaction configs such as 
"hoodie.compact.inline.trigger.strategy" used for triggering compaction, which 
have "inline" in their name but used for async as well. This task is to rename 
such configs while ensuring backward compatibility with older name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2536) Rename compaction config keys

2021-10-08 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-2536:
--
Component/s: configs
 Compaction

> Rename compaction config keys 
> --
>
> Key: HUDI-2536
> URL: https://issues.apache.org/jira/browse/HUDI-2536
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Compaction, configs
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Critical
>
> There are certain compaction configs such as 
> "hoodie.compact.inline.trigger.strategy" used for triggering compaction, 
> which have "inline" in their name but used for async as well. This task is to 
> rename such configs while ensuring backward compatibility with older name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2545)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2549)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2535) Late arriving records and global index with partition path update set to true

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2535:
--
Labels: sev:high user-support-issues  (was: user-support-issues)

> Late arriving records and global index with partition path update set to true
> -
>
> Key: HUDI-2535
> URL: https://issues.apache.org/jira/browse/HUDI-2535
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> incase of a global index, we have a config to update partition path. if this 
> is set to true, if there is an incoming record to a newer partition compared 
> to whats in storage, older record will be deleted and new incoming record 
> will be routed to new partition. 
> But it could run into issues if new incoming is a late arriving record. 
> Expected behavior is, old record is retained and new one is discarded it it 
> has lower preCombine value. But in this case, we may not honor that. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2535) Late arriving records and global index with partition path update set to true

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2535:
--
Labels: user-support-issues  (was: )

> Late arriving records and global index with partition path update set to true
> -
>
> Key: HUDI-2535
> URL: https://issues.apache.org/jira/browse/HUDI-2535
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> incase of a global index, we have a config to update partition path. if this 
> is set to true, if there is an incoming record to a newer partition compared 
> to whats in storage, older record will be deleted and new incoming record 
> will be routed to new partition. 
> But it could run into issues if new incoming is a late arriving record. 
> Expected behavior is, old record is retained and new one is discarded it it 
> has lower preCombine value. But in this case, we may not honor that. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2535) Late arriving records and global index with partition path update set to true

2021-10-08 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2535:
-

Assignee: sivabalan narayanan

> Late arriving records and global index with partition path update set to true
> -
>
> Key: HUDI-2535
> URL: https://issues.apache.org/jira/browse/HUDI-2535
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> incase of a global index, we have a config to update partition path. if this 
> is set to true, if there is an incoming record to a newer partition compared 
> to whats in storage, older record will be deleted and new incoming record 
> will be routed to new partition. 
> But it could run into issues if new incoming is a late arriving record. 
> Expected behavior is, old record is retained and new one is discarded it it 
> has lower preCombine value. But in this case, we may not honor that. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2535) Late arriving records and global index with partition path update set to true

2021-10-08 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2535:
-

 Summary: Late arriving records and global index with partition 
path update set to true
 Key: HUDI-2535
 URL: https://issues.apache.org/jira/browse/HUDI-2535
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: sivabalan narayanan


incase of a global index, we have a config to update partition path. if this is 
set to true, if there is an incoming record to a newer partition compared to 
whats in storage, older record will be deleted and new incoming record will be 
routed to new partition. 

But it could run into issues if new incoming is a late arriving record. 
Expected behavior is, old record is retained and new one is discarded it it has 
lower preCombine value. But in this case, we may not honor that. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3768: [HUDI-2494] Fixing glob pattern to skip all hoodie meta paths

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3768:
URL: https://github.com/apache/hudi/pull/3768#issuecomment-938643768


   
   ## CI report:
   
   * d08cac085906b121ab6e72e822add6d38a3c6034 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2548)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 6f2bc4169918ef270a1982605ffe6025be466f3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2508)
 
   * 99d1ec58c8fcb418986ccba294969aea414c6abb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2550)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * a215aa25074d52c2002a32b68c02b4429902ab3d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2543)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2547)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 6f2bc4169918ef270a1982605ffe6025be466f3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2508)
 
   * 99d1ec58c8fcb418986ccba294969aea414c6abb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3486: [HUDI-2314][WIP] Add support for DynamoDb based lock

2021-10-08 Thread GitBox


nsivabalan commented on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-938662308


   @zhedoubushishi : I see "WIP" in title of the patch? May I know if its ready 
for review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2545)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2549)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3768: [HUDI-2494] Fixing glob pattern to skip all hoodie meta paths

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3768:
URL: https://github.com/apache/hudi/pull/3768#issuecomment-938643768


   
   ## CI report:
   
   * d08cac085906b121ab6e72e822add6d38a3c6034 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2548)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-08 Thread GitBox


nsivabalan commented on a change in pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#discussion_r725013996



##
File path: docker/demo/config/test-suite/compact-test.properties
##
@@ -0,0 +1,58 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+hoodie.insert.shuffle.parallelism=100
+hoodie.upsert.shuffle.parallelism=100
+hoodie.bulkinsert.shuffle.parallelism=100
+
+hoodie.deltastreamer.source.test.num_partitions=100
+hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false
+hoodie.deltastreamer.source.test.max_unique_records=1
+hoodie.embed.timeline.server=false
+hoodie.deltastreamer.source.input.selector=org.apache.hudi.integ.testsuite.helpers.DFSTestSuitePathSelector
+
+hoodie.insert.shuffle.parallelism=100
+hoodie.upsert.shuffle.parallelism=100
+hoodie.bulkinsert.shuffle.parallelism=100
+
+hoodie.deltastreamer.source.input.selector=org.apache.hudi.integ.testsuite.helpers.DFSTestSuitePathSelector
+hoodie.datasource.hive_sync.skip_ro_suffix=true
+
+hoodie.datasource.write.recordkey.field=_row_key
+hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
+hoodie.datasource.write.partitionpath.field=timestamp
+
+hoodie.compact.inline.max.delta.commits=2
+
+hoodie.clustering.plan.strategy.sort.columns=_row_key
+hoodie.clustering.plan.strategy.daybased.lookback.partitions=0
+hoodie.clustering.inline.max.commits=1

Review comment:
   yes, will remove them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


danny0405 commented on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938644823


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3768: [HUDI-2494] Fixing glob pattern to skip all hoodie meta paths

2021-10-08 Thread GitBox


hudi-bot commented on pull request #3768:
URL: https://github.com/apache/hudi/pull/3768#issuecomment-938643768


   
   ## CI report:
   
   * d08cac085906b121ab6e72e822add6d38a3c6034 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2494) Fix usage of different key generators with metadata enabled

2021-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2494:
-
Labels: pull-request-available sev:critical  (was: sev:critical)

> Fix usage of different key generators with metadata enabled
> ---
>
> Key: HUDI-2494
> URL: https://issues.apache.org/jira/browse/HUDI-2494
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> With [sync metadata patch|https://github.com/apache/hudi/pull/3590/], when 
> metadata is enabled by default, some spark datasource tests failed which were 
> using timestamp based key gen and custom key gen. Metadata table's records 
> are getting picked up when we do 
>  
> {code:java}
> spark.read.format(hudi).load(basePath + "/*/*")
> {code}
>  
> For now, I have disabled metadata for these tests. 
> testSparkPartitonByWithTimestampBasedKeyGenerator
> testSparkPartitonByWithCustomKeyGenerator
>  
> I was looking at 
> [options|https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html]
>  to ignore certain path, but looks like there is none. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #3768: [HUDI-2494] Fixing glob pattern to skip all hoodie meta paths

2021-10-08 Thread GitBox


nsivabalan opened a new pull request #3768:
URL: https://github.com/apache/hudi/pull/3768


   ## What is the purpose of the pull request
   
   After removing glob pattern requirement in base path while loading hudi, 
fs.globStatus(basePath) also returns ".hoodie" meta paths. When metadata is 
enabled, we may not want those files to be returned. Hence, removing all meta 
paths from the globPattern look up. 
   
   ## Brief change log
   - Fixed HoodieSparkSqlUtils.globPath() to skip all meta paths. 
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
 - Fixed tests in TestHoodieSparkUtils to verify the fix.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3764: [MINOR] Fix typo,'paritition' corrected to 'partition'

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3764:
URL: https://github.com/apache/hudi/pull/3764#issuecomment-938381930


   
   ## CI report:
   
   * 264e8f48602d37c833edddab90df9cd534fb2277 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2540)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2546)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * a215aa25074d52c2002a32b68c02b4429902ab3d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2543)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2547)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-08 Thread GitBox


zhangyue19921010 commented on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938627449


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [DOCS] fixed typo for kafkacat -> kcat (#3763)

2021-10-08 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c80f095  [DOCS] fixed typo for kafkacat -> kcat (#3763)
c80f095 is described below

commit c80f0957b7ae5f11cbda5ccae609c6fca98492f1
Author: Kyle Weller 
AuthorDate: Fri Oct 8 05:39:26 2021 -0700

[DOCS] fixed typo for kafkacat -> kcat (#3763)
---
 website/versioned_docs/version-0.9.0/docker_demo.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/website/versioned_docs/version-0.9.0/docker_demo.md 
b/website/versioned_docs/version-0.9.0/docker_demo.md
index 0f1a194..1754d75 100644
--- a/website/versioned_docs/version-0.9.0/docker_demo.md
+++ b/website/versioned_docs/version-0.9.0/docker_demo.md
@@ -15,7 +15,7 @@ The steps have been tested on a Mac laptop
 ### Prerequisites
 
   * Docker Setup :  For Mac, Please follow the steps as defined in 
[https://docs.docker.com/v17.12/docker-for-mac/install/]. For running Spark-SQL 
queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See 
Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be 
killed because of memory issues.
-  * kafkacat : A command-line utility to publish/consume from kafka topics. 
Use `brew install kafkacat` to install kafkacat.
+  * kcat : A command-line utility to publish/consume from kafka topics. Use 
`brew install kcat` to install kcat.
   * /etc/hosts : The demo references many services running in container by the 
hostname. Add the following settings to /etc/hosts
 
 ```java
@@ -107,11 +107,11 @@ The batches are windowed intentionally so that the second 
batch contains updates
 
 ### Step 1 : Publish the first batch to Kafka
 
-Upload the first batch to Kafka topic 'stock ticks' `cat 
docker/demo/data/batch_1.json | kafkacat -b kafkabroker -t stock_ticks -P`
+Upload the first batch to Kafka topic 'stock ticks' `cat 
docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
 
 To check if the new topic shows up, use
 ```java
-kafkacat -b kafkabroker -L -J | jq .
+kcat -b kafkabroker -L -J | jq .
 {
   "originating_broker": {
 "id": 1001,
@@ -552,7 +552,7 @@ Upload the second batch of data and ingest this batch using 
delta-streamer. As t
 partitions, there is no need to run hive-sync
 
 ```java
-cat docker/demo/data/batch_2.json | kafkacat -b kafkabroker -t stock_ticks -P
+cat docker/demo/data/batch_2.json | kcat -b kafkabroker -t stock_ticks -P
 
 # Within Docker container, run the ingestion command
 docker exec -it adhoc-2 /bin/bash


[GitHub] [hudi] vinothchandar merged pull request #3763: [MINOR] - Fixed typo in docker demo docs for kafkacat -> kcat

2021-10-08 Thread GitBox


vinothchandar merged pull request #3763:
URL: https://github.com/apache/hudi/pull/3763


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-10-08 Thread GitBox


yanghua commented on pull request #3671:
URL: https://github.com/apache/hudi/pull/3671#issuecomment-938603036


   @fengjian428 Would you please recheck the azure CI? Seems still failing..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3764: [MINOR] Fix typo,'paritition' corrected to 'partition'

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3764:
URL: https://github.com/apache/hudi/pull/3764#issuecomment-938381930


   
   ## CI report:
   
   * 264e8f48602d37c833edddab90df9cd534fb2277 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2540)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2546)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3764: [MINOR] Fix typo,'paritition' corrected to 'partition'

2021-10-08 Thread GitBox


yanghua commented on pull request #3764:
URL: https://github.com/apache/hudi/pull/3764#issuecomment-938599143


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * d0785c12de9936ea9ee2075664d4ee2f4d48570f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2544)
 
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2545)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * d0785c12de9936ea9ee2075664d4ee2f4d48570f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2544)
 
   * 7af3a772bdd286cf440ad359b7f8ef9b33b9cf01 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3767: [HUDI-2534] Remove the sort operation when bulk_insert in batch mode

2021-10-08 Thread GitBox


hudi-bot edited a comment on pull request #3767:
URL: https://github.com/apache/hudi/pull/3767#issuecomment-938527118


   
   ## CI report:
   
   * d0785c12de9936ea9ee2075664d4ee2f4d48570f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2544)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >