date:20210805

[GitHub] [hudi] hudi-bot edited a comment on pull request #3289: [HUDI-2187] Add a shim layer to support multiple hive version

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * cba2f23fb2cfbf01dcd2dc26ca981fd447a7b005 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1389)
 
   * 316b83a6f045c9cc57f049d092463402d4139b95 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2187) Hive integration Improvment

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393668#comment-17393668
 ] 

ASF GitHub Bot commented on HUDI-2187:
--

hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * cba2f23fb2cfbf01dcd2dc26ca981fd447a7b005 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1389)
 
   * 316b83a6f045c9cc57f049d092463402d4139b95 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hive integration Improvment
> ---
>
> Key: HUDI-2187
> URL: https://issues.apache.org/jira/browse/HUDI-2187
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> See the details from RFC doc
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3401: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052


   
   ## CI report:
   
   * 7fe0db6bdda8a2f543d068efa1cbb60682b2ef95 UNKNOWN
   * 829139528e53953e6f39d708d31fb876f0a32cd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1392)
 
   * 0b34d55f238b889fb2fcc2526e4657ea981c431c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1397)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393671#comment-17393671
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052


   
   ## CI report:
   
   * 7fe0db6bdda8a2f543d068efa1cbb60682b2ef95 UNKNOWN
   * 829139528e53953e6f39d708d31fb876f0a32cd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1392)
 
   * 0b34d55f238b889fb2fcc2526e4657ea981c431c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1397)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3289: [HUDI-2187] Add a shim layer to support multiple hive version

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * cba2f23fb2cfbf01dcd2dc26ca981fd447a7b005 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1389)
 
   * 316b83a6f045c9cc57f049d092463402d4139b95 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1398)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2187) Hive integration Improvment

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393682#comment-17393682
 ] 

ASF GitHub Bot commented on HUDI-2187:
--

hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * cba2f23fb2cfbf01dcd2dc26ca981fd447a7b005 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1389)
 
   * 316b83a6f045c9cc57f049d092463402d4139b95 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1398)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hive integration Improvment
> ---
>
> Key: HUDI-2187
> URL: https://issues.apache.org/jira/browse/HUDI-2187
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> See the details from RFC doc
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch release-0.5.0 created (now 1cfd311)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a change to branch release-0.5.0
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at 1cfd311  Changing release version from 0.5.0-incubating-rc6 to 
0.5.0-incubating

No new revisions were added by this update.

[hudi] 03/03: [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J (#1210)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch redo-log
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 8f8cd42e9792e0a9648b68f933929ad87cd503c5
Author: Mathieu <49835526+wangxian...@users.noreply.github.com>
AuthorDate: Mon Jan 13 11:18:09 2020 +0800

[HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J (#1210)
---
 hudi-hadoop-mr/pom.xml |  7 ++
 .../org/apache/hudi/hadoop/HoodieHiveUtil.java | 12 +-
 .../hudi/hadoop/HoodieParquetInputFormat.java  | 26 ++--
 .../hudi/hadoop/HoodieROTablePathFilter.java   | 28 +++---
 .../hudi/hadoop/RecordReaderValueIterator.java |  6 ++---
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  | 27 ++---
 .../realtime/AbstractRealtimeRecordReader.java | 24 +--
 .../realtime/HoodieParquetRealtimeInputFormat.java | 22 -
 .../realtime/HoodieRealtimeRecordReader.java   |  6 ++---
 .../realtime/RealtimeCompactedRecordReader.java| 14 +--
 10 files changed, 89 insertions(+), 83 deletions(-)

diff --git a/hudi-hadoop-mr/pom.xml b/hudi-hadoop-mr/pom.xml
index 6bc3c8e..2a222f3 100644
--- a/hudi-hadoop-mr/pom.xml
+++ b/hudi-hadoop-mr/pom.xml
@@ -81,6 +81,13 @@
   hive-exec
 
 
+
+
+  org.slf4j
+  slf4j-api
+  ${slf4j.version}
+
+
 
 
   org.apache.hudi
diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
index 1db8c54..f371719 100644
--- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
+++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHiveUtil.java
@@ -20,12 +20,12 @@ package org.apache.hudi.hadoop;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.mapreduce.JobContext;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 public class HoodieHiveUtil {
 
-  public static final Logger LOG = LogManager.getLogger(HoodieHiveUtil.class);
+  public static final Logger LOG = 
LoggerFactory.getLogger(HoodieHiveUtil.class);
 
   public static final String HOODIE_CONSUME_MODE_PATTERN = 
"hoodie.%s.consume.mode";
   public static final String HOODIE_START_COMMIT_PATTERN = 
"hoodie.%s.consume.start.timestamp";
@@ -43,20 +43,20 @@ public class HoodieHiveUtil {
 if (maxCommits == MAX_COMMIT_ALL) {
   maxCommits = Integer.MAX_VALUE;
 }
-LOG.info("Read max commits - " + maxCommits);
+LOG.info("Read max commits - {}", maxCommits);
 return maxCommits;
   }
 
   public static String readStartCommitTime(JobContext job, String tableName) {
 String startCommitTimestampName = 
String.format(HOODIE_START_COMMIT_PATTERN, tableName);
-LOG.info("Read start commit time - " + 
job.getConfiguration().get(startCommitTimestampName));
+LOG.info("Read start commit time - {}", 
job.getConfiguration().get(startCommitTimestampName));
 return job.getConfiguration().get(startCommitTimestampName);
   }
 
   public static String readMode(JobContext job, String tableName) {
 String modePropertyName = String.format(HOODIE_CONSUME_MODE_PATTERN, 
tableName);
 String mode = job.getConfiguration().get(modePropertyName, 
DEFAULT_SCAN_MODE);
-LOG.info(modePropertyName + ": " + mode);
+LOG.info("Hoodie consume mode pattern is : {}, mode is : {}", 
modePropertyName, mode);
 return mode;
   }
 
diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
index e8f7de0..ea92f11 100644
--- 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
+++ 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
@@ -42,8 +42,8 @@ import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.RecordReader;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapreduce.Job;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.util.ArrayList;
@@ -60,7 +60,7 @@ import java.util.stream.Collectors;
 @UseFileSplitsFromInputFormat
 public class HoodieParquetInputFormat extends MapredParquetInputFormat 
implements Configurable {
 
-  private static final Logger LOG = 
LogManager.getLogger(HoodieParquetInputFormat.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(HoodieParquetInputFormat.class);
 
   protected Configuration conf;
 
@@ -69,7 +69,7 @@ public class HoodieParquetInputFormat extends 
MapredParquetInputFormat implement
 // Get all the file status from FileInputFormat and then do the filter
 FileStatus[] fileStatuses = super.lis

[hudi] 01/03: [HUDI-459] Redo hudi-hive log statements using SLF4J (#1203)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch redo-log
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit ac105e6d9b6dcd75f8145af6ce03600af40180e0
Author: lamber-ken 
AuthorDate: Fri Jan 10 09:38:34 2020 +0800

[HUDI-459] Redo hudi-hive log statements using SLF4J (#1203)
---
 hudi-hive/pom.xml  |  5 +++
 .../java/org/apache/hudi/hive/HiveSyncTool.java| 30 +++
 .../org/apache/hudi/hive/HoodieHiveClient.java | 44 +++---
 .../java/org/apache/hudi/hive/util/SchemaUtil.java | 12 +++---
 .../org/apache/hudi/hive/util/HiveTestService.java | 10 ++---
 5 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/hudi-hive/pom.xml b/hudi-hive/pom.xml
index c552b70..1ab2533 100644
--- a/hudi-hive/pom.xml
+++ b/hudi-hive/pom.xml
@@ -49,6 +49,11 @@
   log4j
   log4j
 
+
+  org.slf4j
+  slf4j-api
+  ${slf4j.version}
+
 
 
   org.apache.parquet
diff --git a/hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncTool.java 
b/hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
index 6bcb697..4029096 100644
--- a/hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
+++ b/hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
@@ -34,8 +34,8 @@ import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.Partition;
 import org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat;
 import org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import org.apache.parquet.schema.MessageType;
 
 import java.util.List;
@@ -52,7 +52,7 @@ import java.util.stream.Collectors;
 @SuppressWarnings("WeakerAccess")
 public class HiveSyncTool {
 
-  private static final Logger LOG = LogManager.getLogger(HiveSyncTool.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveSyncTool.class);
   private final HoodieHiveClient hoodieHiveClient;
   public static final String SUFFIX_REALTIME_TABLE = "_rt";
   private final HiveSyncConfig cfg;
@@ -79,7 +79,7 @@ public class HiveSyncTool {
   cfg.tableName = originalTableName;
   break;
 default:
-  LOG.error("Unknown table type " + hoodieHiveClient.getTableType());
+  LOG.error("Unknown table type {}", hoodieHiveClient.getTableType());
   throw new InvalidDatasetException(hoodieHiveClient.getBasePath());
   }
 } catch (RuntimeException re) {
@@ -90,8 +90,8 @@ public class HiveSyncTool {
   }
 
   private void syncHoodieTable(boolean isRealTime) throws 
ClassNotFoundException {
-LOG.info("Trying to sync hoodie table " + cfg.tableName + " with base path 
" + hoodieHiveClient.getBasePath()
-+ " of type " + hoodieHiveClient.getTableType());
+LOG.info("Trying to sync hoodie table {} with base path {} of type {}",
+cfg.tableName, hoodieHiveClient.getBasePath(), 
hoodieHiveClient.getTableType());
 
 // Check if the necessary table exists
 boolean tableExists = hoodieHiveClient.doesTableExist();
@@ -100,20 +100,20 @@ public class HiveSyncTool {
 // Sync schema if needed
 syncSchema(tableExists, isRealTime, schema);
 
-LOG.info("Schema sync complete. Syncing partitions for " + cfg.tableName);
+LOG.info("Schema sync complete. Syncing partitions for {}", cfg.tableName);
 // Get the last time we successfully synced partitions
 Option lastCommitTimeSynced = Option.empty();
 if (tableExists) {
   lastCommitTimeSynced = hoodieHiveClient.getLastCommitTimeSynced();
 }
-LOG.info("Last commit time synced was found to be " + 
lastCommitTimeSynced.orElse("null"));
+LOG.info("Last commit time synced was found to be {}", 
lastCommitTimeSynced.orElse("null"));
 List writtenPartitionsSince = 
hoodieHiveClient.getPartitionsWrittenToSince(lastCommitTimeSynced);
-LOG.info("Storage partitions scan complete. Found " + 
writtenPartitionsSince.size());
+LOG.info("Storage partitions scan complete. Found {}", 
writtenPartitionsSince.size());
 // Sync the partitions if needed
 syncPartitions(writtenPartitionsSince);
 
 hoodieHiveClient.updateLastCommitTimeSynced();
-LOG.info("Sync complete for " + cfg.tableName);
+LOG.info("Sync complete for {}", cfg.tableName);
   }
 
   /**
@@ -126,7 +126,7 @@ public class HiveSyncTool {
   private void syncSchema(boolean tableExists, boolean isRealTime, MessageType 
schema) throws ClassNotFoundException {
 // Check and sync schema
 if (!tableExists) {
-  LOG.info("Table " + cfg.tableName + " is not found. Creating it");
+  LOG.info("Table {} is not found. Creating it", cfg.tableName);
   if (!isRealTime) {
 // TODO - RO Table for MOR only after major compaction 
(UnboundedCompaction is default
 // for now)

[hudi] 02/03: [HUDI-457]Redo hudi-common log statements using SLF4J (#1161)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch redo-log
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit a0dc09ec06603a17ccf3962d2172c4bb87c1f3b4
Author: Li Jiaq 
AuthorDate: Fri Jan 10 13:06:42 2020 +0800

[HUDI-457]Redo hudi-common log statements using SLF4J (#1161)
---
 hudi-common/pom.xml|  6 
 .../hudi/common/model/HoodieCommitMetadata.java|  8 ++---
 .../hudi/common/model/HoodiePartitionMetadata.java | 11 +++
 .../common/model/HoodieRollingStatMetadata.java|  8 ++---
 .../hudi/common/table/HoodieTableConfig.java   |  8 ++---
 .../hudi/common/table/HoodieTableMetaClient.java   | 16 -
 .../table/log/AbstractHoodieLogRecordScanner.java  | 29 -
 .../hudi/common/table/log/HoodieLogFileReader.java | 12 +++
 .../hudi/common/table/log/HoodieLogFormat.java | 13 
 .../common/table/log/HoodieLogFormatReader.java|  8 ++---
 .../common/table/log/HoodieLogFormatWriter.java| 23 +++--
 .../table/log/HoodieMergedLogRecordScanner.java| 16 -
 .../table/timeline/HoodieActiveTimeline.java   | 30 -
 .../table/timeline/HoodieArchivedTimeline.java |  4 ---
 .../table/timeline/HoodieDefaultTimeline.java  |  4 ---
 .../table/view/AbstractTableFileSystemView.java| 19 +--
 .../common/table/view/FileSystemViewManager.java   | 15 -
 .../table/view/HoodieTableFileSystemView.java  |  8 ++---
 .../IncrementalTimelineSyncFileSystemView.java | 38 +++---
 .../table/view/PriorityBasedFileSystemView.java|  6 ++--
 .../view/RemoteHoodieTableFileSystemView.java  |  8 ++---
 .../table/view/RocksDbBasedFileSystemView.java | 26 +++
 .../view/SpillableMapBasedFileSystemView.java  | 14 
 .../apache/hudi/common/util/CompactionUtils.java   |  5 ---
 .../common/util/DFSPropertiesConfiguration.java|  8 ++---
 .../java/org/apache/hudi/common/util/FSUtils.java  | 12 +++
 .../hudi/common/util/FailSafeConsistencyGuard.java | 14 
 .../common/util/HoodieRecordSizeEstimator.java |  8 ++---
 .../org/apache/hudi/common/util/RocksDBDAO.java| 18 +-
 .../hudi/common/util/TimelineDiffHelper.java   | 10 +++---
 .../hudi/common/util/collection/DiskBasedMap.java  | 11 +++
 .../util/collection/ExternalSpillableMap.java  | 10 +++---
 .../common/util/queue/BoundedInMemoryExecutor.java |  6 ++--
 .../common/util/queue/BoundedInMemoryQueue.java|  6 ++--
 .../util/queue/FunctionBasedQueueProducer.java |  6 ++--
 .../util/queue/IteratorBasedQueueProducer.java |  6 ++--
 .../hudi/common/minicluster/HdfsTestService.java   | 10 +++---
 .../common/minicluster/ZookeeperTestService.java   | 10 +++---
 .../table/view/TestHoodieTableFileSystemView.java  |  8 ++---
 39 files changed, 232 insertions(+), 246 deletions(-)

diff --git a/hudi-common/pom.xml b/hudi-common/pom.xml
index c9aaf7a..f153119 100644
--- a/hudi-common/pom.xml
+++ b/hudi-common/pom.xml
@@ -166,6 +166,12 @@
 
 
 
+  org.slf4j
+  slf4j-api
+  ${slf4j.version}
+
+
+
   com.github.stefanbirkner
   system-rules
   1.16.0
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
index 475f75c..9a69545 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
@@ -25,8 +25,8 @@ import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
 import com.fasterxml.jackson.annotation.PropertyAccessor;
 import com.fasterxml.jackson.databind.DeserializationFeature;
 import com.fasterxml.jackson.databind.ObjectMapper;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.io.Serializable;
@@ -43,7 +43,7 @@ import java.util.Map;
 public class HoodieCommitMetadata implements Serializable {
 
   public static final String SCHEMA_KEY = "schema";
-  private static final Logger LOG = 
LogManager.getLogger(HoodieCommitMetadata.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(HoodieCommitMetadata.class);
   protected Map> partitionToWriteStats;
   protected Boolean compacted;
 
@@ -118,7 +118,7 @@ public class HoodieCommitMetadata implements Serializable {
 
   public String toJsonString() throws IOException {
 if (partitionToWriteStats.containsKey(null)) {
-  LOG.info("partition path is null for " + 
partitionToWriteStats.get(null));
+  LOG.info("partition path is null for {}", 
partitionToWriteStats.get(null));
   partitionToWriteStats.remove(null);
 }
 return 
getObjectMapper().writerWithDefaultPrettyPrinter().writeValueAsString(this);

[hudi] branch redo-log created (now 8f8cd42)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a change to branch redo-log
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at 8f8cd42  [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J 
(#1210)

This branch includes the following new commits:

 new ac105e6  [HUDI-459] Redo hudi-hive log statements using SLF4J (#1203)
 new a0dc09e  [HUDI-457]Redo hudi-common log statements using SLF4J (#1161)
 new 8f8cd42  [HUDI-458] Redo hudi-hadoop-mr log statements using SLF4J 
(#1210)

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[hudi] branch restructure-hudi-client created (now e7b1961)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a change to branch restructure-hudi-client
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at e7b1961  [HUDI-542] Introduce a new pom module named 
hudi-writer-common (#1314)

This branch includes the following new commits:

 new e7b1961  [HUDI-542] Introduce a new pom module named 
hudi-writer-common (#1314)

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[hudi] 01/01: [HUDI-542] Introduce a new pom module named hudi-writer-common (#1314)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch restructure-hudi-client
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit e7b1961de082b626fdb27890eb4ff701f11dffb4
Author: vinoyang 
AuthorDate: Sat Feb 8 16:20:33 2020 +0800

[HUDI-542] Introduce a new pom module named hudi-writer-common (#1314)
---
 hudi-writer-common/pom.xml | 15 +
 .../org/apache/hudi/writer/common/Placeholder.java | 26 ++
 .../apache/hudi/writer/common/PlaceholderTest.java | 26 ++
 pom.xml|  1 +
 4 files changed, 68 insertions(+)

diff --git a/hudi-writer-common/pom.xml b/hudi-writer-common/pom.xml
new file mode 100644
index 000..eed8c5a
--- /dev/null
+++ b/hudi-writer-common/pom.xml
@@ -0,0 +1,15 @@
+
+http://maven.apache.org/POM/4.0.0";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-writer-common
+
+
+
\ No newline at end of file
diff --git 
a/hudi-writer-common/src/main/java/org/apache/hudi/writer/common/Placeholder.java
 
b/hudi-writer-common/src/main/java/org/apache/hudi/writer/common/Placeholder.java
new file mode 100644
index 000..93d8a35
--- /dev/null
+++ 
b/hudi-writer-common/src/main/java/org/apache/hudi/writer/common/Placeholder.java
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.writer.common;
+
+/**
+ * Used for placeholder purpose.
+ */
+public class Placeholder {
+
+}
diff --git 
a/hudi-writer-common/src/test/java/org/apache/hudi/writer/common/PlaceholderTest.java
 
b/hudi-writer-common/src/test/java/org/apache/hudi/writer/common/PlaceholderTest.java
new file mode 100644
index 000..2a10a58
--- /dev/null
+++ 
b/hudi-writer-common/src/test/java/org/apache/hudi/writer/common/PlaceholderTest.java
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.writer.common;
+
+/**
+ * Used for placeholder purpose.
+ */
+public class PlaceholderTest {
+
+}
diff --git a/pom.xml b/pom.xml
index 8bdb4a6..b6782ad 100644
--- a/pom.xml
+++ b/pom.xml
@@ -51,6 +51,7 @@
 packaging/hudi-timeline-server-bundle
 docker/hoodie/hadoop
 hudi-integ-test
+hudi-writer-common

[hudi] branch release-0.5.3 created (now be12de3)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a change to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at be12de3  Removing spring repos from pom (#2481) (#2548)

This branch includes the following new commits:

 new be12de3  Removing spring repos from pom (#2481) (#2548)

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[hudi] 01/01: Removing spring repos from pom (#2481) (#2548)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch release-0.5.3
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit be12de33774065170a94e48b5874115189ef32d3
Author: Dezhi Cai 
AuthorDate: Thu Feb 18 13:02:44 2021 +0800

Removing spring repos from pom (#2481) (#2548)

- These are being deprecated
- Causes build issues when .m2 does not have this cached already

Co-authored-by: vinoth chandar 
---
 pom.xml | 8 
 1 file changed, 8 deletions(-)

diff --git a/pom.xml b/pom.xml
index 14cd7de..da16b10 100644
--- a/pom.xml
+++ b/pom.xml
@@ -863,14 +863,6 @@
   confluent
   https://packages.confluent.io/maven/
 
-
-  libs-milestone
-  https://repo.spring.io/libs-milestone/
-
-
-  libs-release
-  https://repo.spring.io/libs-release/
-

[hudi] branch release-0.6.0 created (now e599764)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a change to branch release-0.6.0
in repository https://gitbox.apache.org/repos/asf/hudi.git.


  at e599764  Removing spring repos from pom (#2481) (#2552)

This branch includes the following new commits:

 new e599764  Removing spring repos from pom (#2481) (#2552)

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[hudi] 01/01: Removing spring repos from pom (#2481) (#2552)

2021-08-05 Thread pwason

This is an automated email from the ASF dual-hosted git repository.

pwason pushed a commit to branch release-0.6.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit e599764c2dcfbbc15d6554fa0df55b7375e4a31d
Author: Dezhi Cai 
AuthorDate: Thu Feb 18 13:43:27 2021 +0800

Removing spring repos from pom (#2481) (#2552)

- These are being deprecated
- Causes build issues when .m2 does not have this cached already

Co-authored-by: vinoth chandar 
---
 pom.xml | 8 
 1 file changed, 8 deletions(-)

diff --git a/pom.xml b/pom.xml
index 68fe167..8791471 100644
--- a/pom.xml
+++ b/pom.xml
@@ -916,14 +916,6 @@
   confluent
   https://packages.confluent.io/maven/
 
-
-  libs-milestone
-  https://repo.spring.io/libs-milestone/
-
-
-  libs-release
-  https://repo.spring.io/libs-release/
-

[GitHub] [hudi] hudi-bot edited a comment on pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#issuecomment-888079450


   
   ## CI report:
   
   * ee3fa851ec4a06a2b4ea9e1ea4006e4233d0f3fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1396)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393686#comment-17393686
 ] 

ASF GitHub Bot commented on HUDI-2243:
--

hudi-bot edited a comment on pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#issuecomment-888079450


   
   ## CI report:
   
   * ee3fa851ec4a06a2b4ea9e1ea4006e4233d0f3fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1396)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Time Travel Query For Hoodie Table
> --
>
> Key: HUDI-2243
> URL: https://issues.apache.org/jira/browse/HUDI-2243
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support time travel query for hoodie table for both COW and MOR table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3401: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052


   
   ## CI report:
   
   * 7fe0db6bdda8a2f543d068efa1cbb60682b2ef95 UNKNOWN
   * 0b34d55f238b889fb2fcc2526e4657ea981c431c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1397)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393697#comment-17393697
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052


   
   ## CI report:
   
   * 7fe0db6bdda8a2f543d068efa1cbb60682b2ef95 UNKNOWN
   * 0b34d55f238b889fb2fcc2526e4657ea981c431c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1397)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3328: [HUDI-2208] Support Bulk Insert For Spark Sql

2021-08-05 Thread GitBox



pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r683207185



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -248,6 +248,14 @@ object DataSourceWriteOptions {
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
 
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty

Review comment:
   Sound reasonable about this. CTAS use the bulk_insert by default, and 
regular insert for insert into by default.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   For CTAS, we can relax this. Because there is no data exist in the 
target table. We can just combine the input by pk before bulk insert to reach 
the same goal.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   @vinothchandar  well I think  INSERT_DROP_DUPS_OPT_KEY is some different 
from COMBINE_BEFORE_INSERT_PROP. 
   **INSERT_DROP_DUPS_OPT_KEY**:  is used to drop the duplicate record in the 
target table.
   `COMBINE_BEFORE_INSERT_PROP`: is used to combine the duplicate record in the 
input.
   So they are not total the same config. IMO.
   

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL
+// insert overwrite partition
+case (_, _, true, _) if isPartitionedTable => 
INSERT_OVERWRITE_OPERATION_OPT_VAL

Review comment:
   Well, in  spark-sql , `insert overwrite partitioned table `  do not has 
the meaning of insert overwrite all the table. It is only overwrite the 
affected partitions.

##
File

[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393699#comment-17393699
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r683207185



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -248,6 +248,14 @@ object DataSourceWriteOptions {
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
 
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty

Review comment:
   Sound reasonable about this. CTAS use the bulk_insert by default, and 
regular insert for insert into by default.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   For CTAS, we can relax this. Because there is no data exist in the 
target table. We can just combine the input by pk before bulk insert to reach 
the same goal.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   @vinothchandar  well I think  INSERT_DROP_DUPS_OPT_KEY is some different 
from COMBINE_BEFORE_INSERT_PROP. 
   **INSERT_DROP_DUPS_OPT_KEY**:  is used to drop the duplicate record in the 
target table.
   `COMBINE_BEFORE_INSERT_PROP`: is used to combine the duplicate record in the 
input.
   So they are not total the same config. IMO.
   

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL
+// insert overwrite partition
+case (_, _, true, _) if isPartitioned

[GitHub] [hudi] hudi-bot edited a comment on pull request #3289: [HUDI-2187] Add a shim layer to support multiple hive version

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * 316b83a6f045c9cc57f049d092463402d4139b95 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1398)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2187) Hive integration Improvment

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393711#comment-17393711
 ] 

ASF GitHub Bot commented on HUDI-2187:
--

hudi-bot edited a comment on pull request #3289:
URL: https://github.com/apache/hudi/pull/3289#issuecomment-881900670


   
   ## CI report:
   
   * 04cc6dc7a378f36d70c84269baeaae1bd935fdb6 UNKNOWN
   * 316b83a6f045c9cc57f049d092463402d4139b95 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1398)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hive integration Improvment
> ---
>
> Key: HUDI-2187
> URL: https://issues.apache.org/jira/browse/HUDI-2187
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> See the details from RFC doc
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3387: [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * 5dd343e30f7560d258887843a811373feb1a6931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1395)
 
   * b63e015a5795413f21a2c3b96189d1bac832b568 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2233) [SQL] Hive sync is not working

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393716#comment-17393716
 ] 

ASF GitHub Bot commented on HUDI-2233:
--

hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * 5dd343e30f7560d258887843a811373feb1a6931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1395)
 
   * b63e015a5795413f21a2c3b96189d1bac832b568 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Hive sync is not working
> --
>
> Key: HUDI-2233
> URL: https://issues.apache.org/jira/browse/HUDI-2233
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
>  
> {code:java}
> java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:458)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:448)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:426)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:322) 
> at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:230)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3393: [HUDI-1842] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 314a8f66727958ac7830c9d82e8a7ea97be74900 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1394)
 
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393720#comment-17393720
 ] 

ASF GitHub Bot commented on HUDI-1842:
--

hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 314a8f66727958ac7830c9d82e8a7ea97be74900 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1394)
 
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3393: [HUDI-1842] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 314a8f66727958ac7830c9d82e8a7ea97be74900 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1394)
 
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1399)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393721#comment-17393721
 ] 

ASF GitHub Bot commented on HUDI-1842:
--

hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 314a8f66727958ac7830c9d82e8a7ea97be74900 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1394)
 
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1399)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] zhedoubushishi commented on pull request #1975: [HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API

2021-08-05 Thread GitBox



zhedoubushishi commented on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-893290649


   > Sorry for the delay in responding. Similar PR was in progress and has been 
merged #2879.
   > We can close this PR.
   
   Yes feel free to close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhedoubushishi closed pull request #1975: [HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API

2021-08-05 Thread GitBox



zhedoubushishi closed pull request #1975:
URL: https://github.com/apache/hudi/pull/1975


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhedoubushishi edited a comment on pull request #1975: [HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API

2021-08-05 Thread GitBox



zhedoubushishi edited a comment on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-893290649


   > Sorry for the delay in responding. Similar PR was in progress and has been 
merged #2879.
   > We can close this PR.
   
   Yes since it's duplicated work, closed this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1194) Reorganize HoodieHiveClient and make it fully support Hive Metastore API

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393727#comment-17393727
 ] 

ASF GitHub Bot commented on HUDI-1194:
--

zhedoubushishi closed pull request #1975:
URL: https://github.com/apache/hudi/pull/1975


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> 
>
> Key: HUDI-1194
> URL: https://issues.apache.org/jira/browse/HUDI-1194
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive 
> functionalities. One is through Hive JDBC, one is through Hive Metastore API. 
> One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to 
> control whether use Hive JDBC or not. However, this parameter does not 
> accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the 
> methods in HoodieHiveClient will use JDBC, and few methods in 
> HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will 
> use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore 
> API.
> Here is a table shows that what will actually be used when setting use_jdbc 
> to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for 
> {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, 
> }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. 
> resolving null partition hive sync issue and supporting ALTER_TABLE cascade 
> with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current 
> config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive Driver.}}
> {{And we introduce a new parameter 
> }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let 
> you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1194) Reorganize HoodieHiveClient and make it fully support Hive Metastore API

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393728#comment-17393728
 ] 

ASF GitHub Bot commented on HUDI-1194:
--

zhedoubushishi commented on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-893290649


   > Sorry for the delay in responding. Similar PR was in progress and has been 
merged #2879.
   > We can close this PR.
   
   Yes feel free to close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> 
>
> Key: HUDI-1194
> URL: https://issues.apache.org/jira/browse/HUDI-1194
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive 
> functionalities. One is through Hive JDBC, one is through Hive Metastore API. 
> One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to 
> control whether use Hive JDBC or not. However, this parameter does not 
> accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the 
> methods in HoodieHiveClient will use JDBC, and few methods in 
> HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will 
> use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore 
> API.
> Here is a table shows that what will actually be used when setting use_jdbc 
> to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for 
> {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, 
> }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. 
> resolving null partition hive sync issue and supporting ALTER_TABLE cascade 
> with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current 
> config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive Driver.}}
> {{And we introduce a new parameter 
> }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let 
> you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1194) Reorganize HoodieHiveClient and make it fully support Hive Metastore API

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393730#comment-17393730
 ] 

ASF GitHub Bot commented on HUDI-1194:
--

zhedoubushishi edited a comment on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-893290649


   > Sorry for the delay in responding. Similar PR was in progress and has been 
merged #2879.
   > We can close this PR.
   
   Yes since it's duplicated work, closed this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reorganize HoodieHiveClient and make it fully support Hive Metastore API
> 
>
> Key: HUDI-1194
> URL: https://issues.apache.org/jira/browse/HUDI-1194
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> Currently there are three ways in HoodieHiveClient to perform Hive 
> functionalities. One is through Hive JDBC, one is through Hive Metastore API. 
> One is through Hive Driver.
>  
>  There’s a parameter called +{{hoodie.datasource.hive_sync.use_jdbc}}+ to 
> control whether use Hive JDBC or not. However, this parameter does not 
> accurately describe the situation.
>  Basically, current logic is when set +*use_jdbc*+ to true, most of the 
> methods in HoodieHiveClient will use JDBC, and few methods in 
> HoodieHiveClient will use Hive Metastore API.
>  When set +*use_jdbc*+ to false, most of the methods in HoodieHiveClient will 
> use Hive Driver, and few methods in HoodieHiveClient will use Hive Metastore 
> API.
> Here is a table shows that what will actually be used when setting use_jdbc 
> to ture/false.
> |Method|use_jdbc=true|use_jdbc=false|
> |{{addPartitionsToTable}}|JDBC|Hive Driver|
> |{{updatePartitionsToTable}}|JDBC|Hive Driver|
> |{{scanTablePartitions}}|Metastore API|Metastore API|
> |{{updateTableDefinition}}|JDBC|Hive Driver|
> |{{createTable}}|JDBC|Hive Driver|
> |{{getTableSchema}}|JDBC|Metastore API|
> |{{doesTableExist}}|Metastore API|Metastore API|
> |getLastCommitTimeSynced|Metastore API|Metastore API|
> [~bschell] and I developed several Metastore API implementation for 
> {{createTable, }}{{addPartitionsToTable}}{{, }}{{updatePartitionsToTable}}{{, 
> }}{{updateTableDefinition }}{{which will be helpful for several issues: e.g. 
> resolving null partition hive sync issue and supporting ALTER_TABLE cascade 
> with AWS glue catalog}}{{. }}
> {{But it seems hard to organize three implementations within the current 
> config. So we plan to separate HoodieHiveClient into three classes:}}
>  # {{HoodieHiveClient which implements all the APIs through Metastore API.}}
>  # {{HoodieHiveJDBCClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive JDBC.}}
>  # {{HoodieHiveDriverClient which extends from HoodieHiveClient and overwrite 
> several the APIs through Hive Driver.}}
> {{And we introduce a new parameter 
> }}+*hoodie.datasource.hive_sync.hive_client_class*+ which could** _**_ let 
> you choose which Hive Client class to use.
> {{}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3387: [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * 5dd343e30f7560d258887843a811373feb1a6931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1395)
 
   * b63e015a5795413f21a2c3b96189d1bac832b568 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1400)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2233) [SQL] Hive sync is not working

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393732#comment-17393732
 ] 

ASF GitHub Bot commented on HUDI-2233:
--

hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * 5dd343e30f7560d258887843a811373feb1a6931 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1395)
 
   * b63e015a5795413f21a2c3b96189d1bac832b568 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1400)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Hive sync is not working
> --
>
> Key: HUDI-2233
> URL: https://issues.apache.org/jira/browse/HUDI-2233
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
>  
> {code:java}
> java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:458)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:448)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:426)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:322) 
> at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:230)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread Yue Zhang (Jira)

Yue Zhang created HUDI-2277:
---

 Summary: Let HoodieDeltaStreamer reading ORC files using 
ORCDFSSource
 Key: HUDI-2277
 URL: https://issues.apache.org/jira/browse/HUDI-2277
 Project: Apache Hudi
  Issue Type: Task
Reporter: Yue Zhang


Develop a new Source named ORCDFSSource extended from RowSource

Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] zhangyue19921010 opened a new pull request #3413: [HUDI-2277] Let HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-08-05 Thread GitBox



zhangyue19921010 opened a new pull request #3413:
URL: https://github.com/apache/hudi/pull/3413


   
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-2277?filter=reportedbyme
   
   ## What is the purpose of the pull request
   Develop a new Source named ORCDFSSource extended from RowSource
   
   Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource.
   
   Also add UTs which are necessary and tested on our local env.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393747#comment-17393747
 ] 

ASF GitHub Bot commented on HUDI-2277:
--

zhangyue19921010 opened a new pull request #3413:
URL: https://github.com/apache/hudi/pull/3413


   
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-2277?filter=reportedbyme
   
   ## What is the purpose of the pull request
   Develop a new Source named ORCDFSSource extended from RowSource
   
   Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource.
   
   Also add UTs which are necessary and tested on our local env.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2277:
-
Labels: pull-request-available  (was: )

> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] danny0405 closed pull request #3403: [HUDI-2274] Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread GitBox



danny0405 closed pull request #3403:
URL: https://github.com/apache/hudi/pull/3403


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2274) Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393749#comment-17393749
 ] 

ASF GitHub Bot commented on HUDI-2274:
--

danny0405 opened a new pull request #3403:
URL: https://github.com/apache/hudi/pull/3403


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allows INSERT duplicates for Flink MOR table
> 
>
> Key: HUDI-2274
> URL: https://issues.apache.org/jira/browse/HUDI-2274
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2274) Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393748#comment-17393748
 ] 

ASF GitHub Bot commented on HUDI-2274:
--

danny0405 closed pull request #3403:
URL: https://github.com/apache/hudi/pull/3403


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allows INSERT duplicates for Flink MOR table
> 
>
> Key: HUDI-2274
> URL: https://issues.apache.org/jira/browse/HUDI-2274
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot commented on pull request #3413: [HUDI-2277] Let HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-08-05 Thread GitBox



hudi-bot commented on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393750#comment-17393750
 ] 

ASF GitHub Bot commented on HUDI-2277:
--

hudi-bot commented on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] Let HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393751#comment-17393751
 ] 

ASF GitHub Bot commented on HUDI-2277:
--

hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3393: [HUDI-1842] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1399)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393758#comment-17393758
 ] 

ASF GitHub Bot commented on HUDI-1842:
--

hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944


   
   ## CI report:
   
   * 921d21fc6732fe51296e21fd3a26fa11c16bfca3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1399)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3387: [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * b63e015a5795413f21a2c3b96189d1bac832b568 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1400)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2233) [SQL] Hive sync is not working

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393768#comment-17393768
 ] 

ASF GitHub Bot commented on HUDI-2233:
--

hudi-bot edited a comment on pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#issuecomment-891570386


   
   ## CI report:
   
   * dae0d69eade3ba95d39e37c1851a56534f80e007 UNKNOWN
   * 6043d6a54b7e2d70a071f556b4eb3da8e3992e2c UNKNOWN
   * b63e015a5795413f21a2c3b96189d1bac832b568 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1400)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Hive sync is not working
> --
>
> Key: HUDI-2233
> URL: https://issues.apache.org/jira/browse/HUDI-2233
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
>  
> {code:java}
> java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:458)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:448)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:426)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:322) 
> at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:230)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3328: [HUDI-2208] Support Bulk Insert For Spark Sql

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * e88244d233d323364916c4fc240083566ddc4e56 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272)
 
   * b3e8a6d36161d5da60a1429e518253e1bff92a9d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393781#comment-17393781
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

hudi-bot edited a comment on pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#issuecomment-884869427


   
   ## CI report:
   
   * 9c9f804618dd0275abdae10673c21bf1f5737caf UNKNOWN
   * 50539ec543951e7a4442798ac7c66e5dc3d3705a UNKNOWN
   * f8b449c31ee8601542f00e3cc15fbcab77da7787 UNKNOWN
   * bb9a6d83361f3a652b2c902b1b3dc846de617d93 UNKNOWN
   * e88244d233d323364916c4fc240083566ddc4e56 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1272)
 
   * b3e8a6d36161d5da60a1429e518253e1bff92a9d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Support Bulk Insert For Spark Sql
> ---
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] Let HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393783#comment-17393783
 ] 

ASF GitHub Bot commented on HUDI-2277:
--

hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636


   
   ## CI report:
   
   * 45fbd4f73a6ccd0918e545702900351a2ed1070b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1401)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot commented on pull request #3407: [HUDI-2268] Add upgrade/downgrade to and from 0.9.0

2021-08-05 Thread GitBox



hudi-bot commented on pull request #3407:
URL: https://github.com/apache/hudi/pull/3407#issuecomment-892882427


   
   ## CI report:
   
   * 4f094a9786e520d69ceb51437af5a224cca63579 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2268) Upgrade hoodie table to 0.9.0

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393811#comment-17393811
 ] 

ASF GitHub Bot commented on HUDI-2268:
--

hudi-bot commented on pull request #3407:
URL: https://github.com/apache/hudi/pull/3407#issuecomment-892882427


   
   ## CI report:
   
   * 4f094a9786e520d69ceb51437af5a224cca63579 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade hoodie table to 0.9.0
> -
>
> Key: HUDI-2268
> URL: https://issues.apache.org/jira/browse/HUDI-2268
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Wrt upgrading/downgrading hoodie.properties, here is what we can go. 
> Add a new table version, 2. 
> Add an upgrade step:
> before every write operation. 
>      Check if existing hoodie.props is in an older version. If yes, perform 
> upgrade step to version2 (either from 0 to 2 or from 1 to 2). This 
> essentially means that we need to add new properties pertaining to sql dml to 
> hoodie.properties. 
> Things to watch out for:
> for some operations, not all props might be set by the user. So, we might 
> need to throw an exception. (record key field, partition path field, key gen 
> prop, precombine field). 
> We need to fetch latest table schema since the incoming df could have partial 
> cols.
>  
> Downgrade step: 
> hoodie.properties will have some additional properties. Should not cause any 
> harm. All we need to do is to downgrade the table version to target version 
> and not touch any of the props. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3413: [HUDI-2277] Let HoodieDeltaStreamer reading ORC files directly using ORCDFSSource

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3411: [HUDI-2276] Enable metadata table by default for readers and writers

2021-08-05 Thread GitBox



hudi-bot commented on pull request #3411:
URL: https://github.com/apache/hudi/pull/3411#issuecomment-893073002


   
   ## CI report:
   
   * e441c95e938929d79f78fb9561869bd726dd69b8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2274) Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393813#comment-17393813
 ] 

ASF GitHub Bot commented on HUDI-2274:
--

danny0405 opened a new pull request #3403:
URL: https://github.com/apache/hudi/pull/3403






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allows INSERT duplicates for Flink MOR table
> 
>
> Key: HUDI-2274
> URL: https://issues.apache.org/jira/browse/HUDI-2274
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2276) Enable Metadata Table by default for both writers and readers

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393814#comment-17393814
 ] 

ASF GitHub Bot commented on HUDI-2276:
--

hudi-bot commented on pull request #3411:
URL: https://github.com/apache/hudi/pull/3411#issuecomment-893073002


   
   ## CI report:
   
   * e441c95e938929d79f78fb9561869bd726dd69b8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enable Metadata Table by default for both writers and readers
> -
>
> Key: HUDI-2276
> URL: https://issues.apache.org/jira/browse/HUDI-2276
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: configs, Performance
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> We have had metadata table disabled by default in Hudi 0.8.0, as we released 
> it as an experimental feature. With Hudi 0.9.0, we should enable it by 
> default to improve out of the box performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2277) Let HoodieDeltaStreamer reading ORC files using ORCDFSSource

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393812#comment-17393812
 ] 

ASF GitHub Bot commented on HUDI-2277:
--

hudi-bot edited a comment on pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#issuecomment-893311636






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Let HoodieDeltaStreamer reading ORC files using ORCDFSSource
> 
>
> Key: HUDI-2277
> URL: https://issues.apache.org/jira/browse/HUDI-2277
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Develop a new Source named ORCDFSSource extended from RowSource
> Now, HoodieDeltaStreamer can read orc files directly using ORCDFSSource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3402: [HUDI-2167] HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3402:
URL: https://github.com/apache/hudi/pull/3402#issuecomment-892505211


   
   ## CI report:
   
   * 40aa4313a56c82473828865ddcef89b550499d1e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1359)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3285: [HUDI-1771] Propagate CDC format for hoodie

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3285:
URL: https://github.com/apache/hudi/pull/3285#issuecomment-881141261






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3408: [MINOR] Move to ubuntu-latest for Azure CI

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3408:
URL: https://github.com/apache/hudi/pull/3408#issuecomment-892925161


   
   ## CI report:
   
   * f3c307c39786779117314c2164a3b982945784cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1371)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2167) HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393816#comment-17393816
 ] 

ASF GitHub Bot commented on HUDI-2167:
--

hudi-bot edited a comment on pull request #3402:
URL: https://github.com/apache/hudi/pull/3402#issuecomment-892505211


   
   ## CI report:
   
   * 40aa4313a56c82473828865ddcef89b550499d1e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1359)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException
> 
>
> Key: HUDI-2167
> URL: https://issues.apache.org/jira/browse/HUDI-2167
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: CLI, Flink Integration
>Reporter: tsianglei
>Priority: Major
>  Labels: pull-request-available
>
> Caused by: java.lang.NullPointerException: Name is null
>  at java.lang.Enum.valueOf(Enum.java:236) ~[?:1.8.0_221]
>  at 
> org.apache.hudi.common.model.HoodieCleaningPolicy.valueOf(HoodieCleaningPolicy.java:24)
>  ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
>  at 
> org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:368)
>  ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
>  at 
> org.apache.hudi.util.StreamerUtil.getHoodieClientConfig(StreamerUtil.java:155)
>  ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
>  at 
> org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:277) 
> ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
>  at 
> org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:154)
>  ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
>  at 
> org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:189)
>  ~[flink-dist_2.11-1.12.2.jar:1.12.2]
>  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startAllOperatorCoordinators(SchedulerBase.java:1253)
>  ~[flink-dist_2.11-1.12.2.jar:1.12.2]
>  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:624)
>  ~[flink-dist_2.11-1.12.2.jar:1.12.2]
>  at 
> org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:1032)
>  ~[flink-dist_2.11-1.12.2.jar:1.12.2]
>  at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705) 
> ~[?:1.8.0_221]
>  ... 27 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1771) Propagate CDC format for hoodie

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393817#comment-17393817
 ] 

ASF GitHub Bot commented on HUDI-1771:
--

hudi-bot edited a comment on pull request #3285:
URL: https://github.com/apache/hudi/pull/3285#issuecomment-881141261






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Propagate CDC format for hoodie
> ---
>
> Key: HUDI-1771
> URL: https://issues.apache.org/jira/browse/HUDI-1771
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>
> Like what we discussed in the dev mailing list: 
> https://lists.apache.org/thread.html/r31b2d1404e4e043a5f875b78105ba6f9a801e78f265ad91242ad5eb2%40%3Cdev.hudi.apache.org%3E
> Keep the change flags make new use cases possible: using HUDI as the unified 
> storage format for DWD and DWS layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] liujinhui1994 closed issue #3280: [SUPPORT] Use structedstreaming to consume kafka to write to hudi error

2021-08-05 Thread GitBox



liujinhui1994 closed issue #3280:
URL: https://github.com/apache/hudi/issues/3280


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] vinothchandar commented on pull request #3325: [WIP] Fixing payload instantiation to include preCombine field in LogRecordScanner

2021-08-05 Thread GitBox



vinothchandar commented on pull request #3325:
URL: https://github.com/apache/hudi/pull/3325#issuecomment-892934035


   @danny0405 any update on #3267 ? or can we prioritize this PR over that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 closed pull request #3403: [HUDI-2274] Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread GitBox



danny0405 closed pull request #3403:
URL: https://github.com/apache/hudi/pull/3403






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3328: [HUDI-2208] Support Bulk Insert For Spark Sql

2021-08-05 Thread GitBox



pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r683207185



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -248,6 +248,14 @@ object DataSourceWriteOptions {
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
 
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty

Review comment:
   Sound reasonable about this. CTAS use the bulk_insert by default, and 
regular insert for insert into by default.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   For CTAS, we can relax this. Because there is no data exist in the 
target table. We can just combine the input by pk before bulk insert to reach 
the same goal.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   @vinothchandar  well I think  INSERT_DROP_DUPS_OPT_KEY is some different 
from COMBINE_BEFORE_INSERT_PROP. 
   **INSERT_DROP_DUPS_OPT_KEY**:  is used to drop the duplicate record in the 
target table.
   `COMBINE_BEFORE_INSERT_PROP`: is used to combine the duplicate record in the 
input.
   So they are not total the same config. IMO.
   

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL
+// insert overwrite partition
+case (_, _, true, _) if isPartitionedTable => 
INSERT_OVERWRITE_OPERATION_OPT_VAL

Review comment:
   Well, in  spark-sql , `insert overwrite partitioned table `  do not has 
the meaning of insert overwrite all the table. It is only overwrite the 
affected partitions.

##
File

[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393821#comment-17393821
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r683207185



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -248,6 +248,14 @@ object DataSourceWriteOptions {
 .withDocumentation("When set to true, will perform write operations 
directly using the spark native " +
   "`Row` representation, avoiding any additional conversion costs.")
 
+  /**
+   * Enable the bulk insert for sql insert statement.
+   */
+  val SQL_ENABLE_BULK_INSERT:ConfigProperty[String] = ConfigProperty

Review comment:
   Sound reasonable about this. CTAS use the bulk_insert by default, and 
regular insert for insert into by default.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")

Review comment:
   For CTAS, we can relax this. Because there is no data exist in the 
target table. We can just combine the input by pk before bulk insert to reach 
the same goal.

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -159,7 +159,10 @@ object HoodieSparkSqlWriter {
 
   // Convert to RDD[HoodieRecord]
   val genericRecords: RDD[GenericRecord] = 
HoodieSparkUtils.createRdd(df, schema, structName, nameSpace)
-  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean || 
operation.equals(WriteOperationType.UPSERT);
+  val shouldCombine = 
parameters(INSERT_DROP_DUPS_OPT_KEY.key()).toBoolean ||
+operation.equals(WriteOperationType.UPSERT) ||
+
parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key(),

Review comment:
   @vinothchandar  well I think  INSERT_DROP_DUPS_OPT_KEY is some different 
from COMBINE_BEFORE_INSERT_PROP. 
   **INSERT_DROP_DUPS_OPT_KEY**:  is used to drop the duplicate record in the 
target table.
   `COMBINE_BEFORE_INSERT_PROP`: is used to combine the duplicate record in the 
input.
   So they are not total the same config. IMO.
   

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -209,19 +209,32 @@ object InsertIntoHoodieTableCommand {
   .getOrElse(INSERT_DROP_DUPS_OPT_KEY.defaultValue)
   .toBoolean
 
-val operation = if (isOverwrite) {
-  if (table.partitionColumnNames.nonEmpty) {
-INSERT_OVERWRITE_OPERATION_OPT_VAL  // overwrite partition
-  } else {
-INSERT_OPERATION_OPT_VAL
+val enableBulkInsert = 
parameters.getOrElse(DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.key,
+  DataSourceWriteOptions.SQL_ENABLE_BULK_INSERT.defaultValue()).toBoolean
+val isPartitionedTable = table.partitionColumnNames.nonEmpty
+val isPrimaryKeyTable = primaryColumns.nonEmpty
+val operation =
+  (isPrimaryKeyTable, enableBulkInsert, isOverwrite, dropDuplicate) match {
+case (true, true, _, _) =>
+  throw new IllegalArgumentException(s"Table with primaryKey can not 
use bulk insert.")
+case (_, true, true, _) if isPartitionedTable =>
+  throw new IllegalArgumentException(s"Insert Overwrite Partition can 
not use bulk insert.")
+case (_, true, _, true) =>
+  throw new IllegalArgumentException(s"Bulk insert cannot support drop 
duplication." +
+s" Please disable $INSERT_DROP_DUPS_OPT_KEY and try again.")
+// if enableBulkInsert is true, use bulk insert for the insert 
overwrite non-partitioned table.
+case (_, true, true, _) if !isPartitionedTable => 
BULK_INSERT_OPERATION_OPT_VAL
+// insert overwrite partition
+case (_, _, true, _) if isPartitioned

[jira] [Commented] (HUDI-2274) Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393820#comment-17393820
 ] 

ASF GitHub Bot commented on HUDI-2274:
--

danny0405 closed pull request #3403:
URL: https://github.com/apache/hudi/pull/3403






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allows INSERT duplicates for Flink MOR table
> 
>
> Key: HUDI-2274
> URL: https://issues.apache.org/jira/browse/HUDI-2274
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3393: [HUDI-1842] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393823#comment-17393823
 ] 

ASF GitHub Bot commented on HUDI-1842:
--

hudi-bot edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-891812944






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3233: [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#issuecomment-875280958






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393824#comment-17393824
 ] 

ASF GitHub Bot commented on HUDI-1138:
--

hudi-bot edited a comment on pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#issuecomment-875280958






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Re-implement marker files via timeline server
> -
>
> Key: HUDI-1138
> URL: https://issues.apache.org/jira/browse/HUDI-1138
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Even as you can argue that RFC-15/consolidated metadata, removes the need for 
> deleting partial files written due to spark task failures/stage retries. It 
> will still leave extra files inside the table (and users will pay for it 
> every month) and we need the marker mechanism to be able to delete these 
> partial files. 
> Here we explore if we can improve the current marker file mechanism, that 
> creates one marker file per data file written, by 
> Delegating the createMarker() call to the driver/timeline server, and have it 
> create marker metadata into a single file handle, that is flushed for 
> durability guarantees
>  
> P.S: I was tempted to think Spark listener mechanism can help us deal with 
> failed tasks, but it has no guarantees. the writer job could die without 
> deleting a partial file. i.e it can improve things, but cant provide 
> guarantees 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan merged pull request #3398: [HUDI-2273] Moving some long running tests to functional

2021-08-05 Thread GitBox



nsivabalan merged pull request #3398:
URL: https://github.com/apache/hudi/pull/3398


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a change in pull request #3398: [HUDI-2273] Moving some long running tests to functional

2021-08-05 Thread GitBox



xushiyan commented on a change in pull request #3398:
URL: https://github.com/apache/hudi/pull/3398#discussion_r683038895



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   Yes this looks good. `*FunctionalTestSuite.java` is the entrypoint for 
functional tests in each module. Defined in pom.xml functional test profile

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   i think one should suffice with `@SelectPackages("")` handles multiple 
package paths. but please give a try i haven't done it myself :)

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   i think one should suffice with `@SelectPackages("")` handles multiple 
package paths. but please give a try i haven't done it myself :). (Or even if 
package paths the same for java and scala, it might still work)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot edited a comment on pull request #3401: [HUDI-2170] Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2170) Always choose the latest record for HoodieRecordPayload

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393830#comment-17393830
 ] 

ASF GitHub Bot commented on HUDI-2170:
--

hudi-bot edited a comment on pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#issuecomment-892472052






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> ---
>
> Key: HUDI-2170
> URL: https://issues.apache.org/jira/browse/HUDI-2170
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2273) Bring down the total test run time with CI

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393829#comment-17393829
 ] 

ASF GitHub Bot commented on HUDI-2273:
--

xushiyan commented on a change in pull request #3398:
URL: https://github.com/apache/hudi/pull/3398#discussion_r683038895



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   Yes this looks good. `*FunctionalTestSuite.java` is the entrypoint for 
functional tests in each module. Defined in pom.xml functional test profile

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   i think one should suffice with `@SelectPackages("")` handles multiple 
package paths. but please give a try i haven't done it myself :)

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/SparkClientFunctionalTestSuite.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.functional;
+
+import org.junit.platform.runner.JUnitPlatform;
+import org.junit.platform.suite.api.IncludeTags;
+import org.junit.platform.suite.api.SelectPackages;
+import org.junit.runner.RunWith;
+
+@RunWith(JUnitPlatform.class)
+@SelectPackages("org.apache.hudi.client")
+@IncludeTags("functional")
+public class SparkClientFunctionalTestSuite {

Review comment:
   i think one should suffice with `@SelectPackages("")` handles multiple 
package paths. but please give a try i haven't done it myself

[jira] [Commented] (HUDI-2273) Bring down the total test run time with CI

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393827#comment-17393827
 ] 

ASF GitHub Bot commented on HUDI-2273:
--

nsivabalan merged pull request #3398:
URL: https://github.com/apache/hudi/pull/3398


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bring down the total test run time with CI
> --
>
> Key: HUDI-2273
> URL: https://issues.apache.org/jira/browse/HUDI-2273
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Bring down the total test run time with CI. As of now, utilities, 
> spark-client and rest are taking > 50 mins. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot commented on pull request #3403: [HUDI-2274] Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread GitBox



hudi-bot commented on pull request #3403:
URL: https://github.com/apache/hudi/pull/3403#issuecomment-892564001


   
   ## CI report:
   
   * 7054bac54ab01214b060ed22332149f41dff1ad3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #3393: [HUDI-1842] Spark Sql Support For The Exists Hoodie Table

2021-08-05 Thread GitBox



pengzhiwei2018 edited a comment on pull request #3393:
URL: https://github.com/apache/hudi/pull/3393#issuecomment-893180101


   > > > > Hey peng. I did a round of testing on this patch. Here are my 
findings.
   > > > > Insert into is till prefixing col name to meta fields. (3rd col and 
4th col)
   > > > > ```
   > > > > select * from hudi_ny where tpep_pickup_datetime like '%00:04:03%';
   > > > > 20210802105420   20210802105420_2_23 2019-01-01 00:04:03 
2019-01-01  
c5e6a617-dfc5-4051-8c1a-8daead3847af-0_2-37-62_20210802105420.parquet   2   
2019-01-01 00:04:03 2019-01-01 00:11:48 1   3.011   N   
137 262 1   10.00.5 0.5 2.260.0 0.3 13.56   
NULL2019-01-01
   > > > > 20210803162030   20210803162030_0_1  
tpep_pickup_datetime:2021-01-01 00:04:03date_col=2021-01-01 
c5c72f9e-9a63-48ca-a981-4302890f5210-0_0-27-1635_20210803162030.parquet 2   
2021-01-01 00:04:03 2021-01-01 00:11:48 1   3.011   N   
137 262 10.00.5 0.5 2.260.0 0.3 13.56   NULL
2021-01-01
   > > > > Time taken: 0.524 seconds, Fetched 2 row(s)
   > > > > ```
   > > > > 
   > > > > 
   > > > > 
   > > > >   
   > > > > 
   > > > > 
   > > > >   
   > > > > 
   > > > > 
   > > > > 
   > > > >   
   > > > > 1st row was part of the table before onboarding to spark-sql.
   > > > > 2nd row was inserted using insert into.
   > > > > Hi @nsivabalan , I know the difference now. The spark sql use the 
`SqlKeyGenerator` which is a sub-class of `ComplexKeyGenerator` to generated 
record key which will add the column name to the record key. While the 
`SimpleKeyGenerator` will not do that. So we should keep the behavior the same 
for `ComplexKeyGenerator` and `SimpleKeyGenerator`.
   > > > 
   > > > 
   > > > sorry, I don't get you. I understand SqlKeyGenerator extends from 
ComplexKeyGen. but why do we need to keep the same for SimpleKeyGen? We should 
not add any field prefix for SimpleKeyGen. If not, no updates will work for an 
existing table.
   > > 
   > > 
   > > Hi @nsivabalan , Have solved the record key not matched issue. Please 
take a test again~
   > 
   > @pengzhiwei2018 I tested the patch. I can see the column names are no 
longer being prefixed. Updates and deletes by record key is working fine now. 
However., the uri encoding of partition path is still an issue. For example, I 
did an insert to an existing partition. The insert was successful but it 
created a new partition as below:
   > 
   > ```
   > insert into hudi_trips_cow values(1.0, 2.0, "driver_2", 3.0, 4.0, 100.0, 
"rider_2", 12345, "765544i-e89b-12d3-a456-42665544", 
"americas/united_states/san_francisco/");
   > 
   > % ls -l /private/tmp/hudi_trips_cow
   > total 0
   > drwxr-xr-x  4 sagars  wheel  128 Aug  4 16:49 americas
   > drwxr-xr-x  6 sagars  wheel  192 Aug  4 16:50 
americas%2Funited_states%2Fsan_francisco%2F
   > drwxr-xr-x  3 sagars  wheel   96 Aug  4 16:49 asia
   > ```
   
   Hi @codope , can you drop the table and create again with the latest code of 
this patch?  I am afraid this happen because you create the table by the old 
code of the patch. I have fix this issue in the latest .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2274) Allows INSERT duplicates for Flink MOR table

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393831#comment-17393831
 ] 

ASF GitHub Bot commented on HUDI-2274:
--

hudi-bot commented on pull request #3403:
URL: https://github.com/apache/hudi/pull/3403#issuecomment-892564001


   
   ## CI report:
   
   * 7054bac54ab01214b060ed22332149f41dff1ad3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allows INSERT duplicates for Flink MOR table
> 
>
> Key: HUDI-2274
> URL: https://issues.apache.org/jira/browse/HUDI-2274
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393834#comment-17393834
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vinothchandar commented on pull request #3405:
URL: https://github.com/apache/hudi/pull/3405#issuecomment-892845002


   Woot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] swuferhong closed pull request #3285: [HUDI-1771] Propagate CDC format for hoodie

2021-08-05 Thread GitBox



swuferhong closed pull request #3285:
URL: https://github.com/apache/hudi/pull/3285


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #2927: [HUDI-1129] Improving schema evolution support in hudi

2021-08-05 Thread GitBox



hudi-bot edited a comment on pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#issuecomment-864700767






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1771) Propagate CDC format for hoodie

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393835#comment-17393835
 ] 

ASF GitHub Bot commented on HUDI-1771:
--

swuferhong closed pull request #3285:
URL: https://github.com/apache/hudi/pull/3285


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Propagate CDC format for hoodie
> ---
>
> Key: HUDI-1771
> URL: https://issues.apache.org/jira/browse/HUDI-1771
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.9.0
>
>
> Like what we discussed in the dev mailing list: 
> https://lists.apache.org/thread.html/r31b2d1404e4e043a5f875b78105ba6f9a801e78f265ad91242ad5eb2%40%3Cdev.hudi.apache.org%3E
> Keep the change flags make new use cases possible: using HUDI as the unified 
> storage format for DWD and DWS layer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] nsivabalan commented on pull request #3412: [MINOR] Moving some scala tests in hudi-spark to functional

2021-08-05 Thread GitBox



nsivabalan commented on pull request #3412:
URL: https://github.com/apache/hudi/pull/3412#issuecomment-893160317






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3398: [HUDI-2273] Moving some long running tests to functional

2021-08-05 Thread GitBox



nsivabalan commented on pull request #3398:
URL: https://github.com/apache/hudi/pull/3398#issuecomment-892709099






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3387: [HUDI-2233] Use HMS To Sync Hive Meta For Spark Sql

2021-08-05 Thread GitBox



pengzhiwei2018 commented on a change in pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#discussion_r682621582



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -399,6 +400,11 @@ object DataSourceWriteOptions {
 .defaultValue(1000)
 .withDocumentation("The number of partitions one batch when synchronous 
partitions to hive.")
 
+  val HIVE_SYNC_MODE: ConfigProperty[String] = ConfigProperty
+.key("hoodie.datasource.hive_sync.mode")
+.noDefaultValue()

Review comment:
   Currently `jdbc` is the default value for spark sql. But for spark 
datasource, If we set `jdbc` as the default here, the "useJdbc"  config will 
not work. see the logical in `HoodieHiveClient`. So it would affect the old 
spark datasource job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 commented on pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-08-05 Thread GitBox



pengzhiwei2018 commented on pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#issuecomment-892568554


   > > > @pengzhiwei2018 Thanks for this. Maybe we could have couple of follow 
on tasks.
   > > > 
   > > > 1. Allow user to specify `as.of.instant` in `-MM-DD` to 
`-MM-DD hh:mm:ss` formts.
   > > > 2. Support this with SQL dml as well, e.g. `select a,b,c from 
hudi_table AS OF 20210728141108 `. This would really help useers to rollback 
using CTAS directly. What do you think?
   > > 
   > > 
   > > Both Agreed! I will submit a PR to support time travel query for spark 
sql after #3277 has merged as we need do some sql extension for `hudi-spark` 
module based on that PR which has did a lot of basic work.
   > > For question 1, This PR has supported now!
   > 
   > @pengzhiwei2018 Thanks for quickly adding the first suggestion. This diff 
looks good to me. Can you resolve the conflicts and then we can land it?
   
   The PR has updated to solve the conflicts. Please take a review again~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-2273) Bring down the total test run time with CI

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393839#comment-17393839
 ] 

ASF GitHub Bot commented on HUDI-2273:
--

nsivabalan commented on pull request #3398:
URL: https://github.com/apache/hudi/pull/3398#issuecomment-892709099






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bring down the total test run time with CI
> --
>
> Key: HUDI-2273
> URL: https://issues.apache.org/jira/browse/HUDI-2273
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Bring down the total test run time with CI. As of now, utilities, 
> spark-client and rest are taking > 50 mins. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1129) AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393838#comment-17393838
 ] 

ASF GitHub Bot commented on HUDI-1129:
--

hudi-bot edited a comment on pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#issuecomment-864700767






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AvroConversionUtils unable to handle avro to row transformation when passing 
> evolved schema 
> 
>
> Key: HUDI-1129
> URL: https://issues.apache.org/jira/browse/HUDI-1129
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> Unit test to repro : 
> [https://github.com/apache/hudi/pull/1844/files#diff-2c3763c5782af9c3cbc02e2935211587R476]
> Context in : 
> [https://github.com/apache/hudi/issues/1845#issuecomment-665180775] (issue 2)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2233) [SQL] Hive sync is not working

2021-08-05 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393840#comment-17393840
 ] 

ASF GitHub Bot commented on HUDI-2233:
--

pengzhiwei2018 commented on a change in pull request #3387:
URL: https://github.com/apache/hudi/pull/3387#discussion_r682621582



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -399,6 +400,11 @@ object DataSourceWriteOptions {
 .defaultValue(1000)
 .withDocumentation("The number of partitions one batch when synchronous 
partitions to hive.")
 
+  val HIVE_SYNC_MODE: ConfigProperty[String] = ConfigProperty
+.key("hoodie.datasource.hive_sync.mode")
+.noDefaultValue()

Review comment:
   Currently `jdbc` is the default value for spark sql. But for spark 
datasource, If we set `jdbc` as the default here, the "useJdbc"  config will 
not work. see the logical in `HoodieHiveClient`. So it would affect the old 
spark datasource job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Hive sync is not working
> --
>
> Key: HUDI-2233
> URL: https://issues.apache.org/jira/browse/HUDI-2233
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
>  
> {code:java}
> java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:458)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:448)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:426)
>  at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:322) 
> at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:230)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 5 >

1 - 100 of 497 matches

Mail list logo