[GitHub] [hudi] xushiyan commented on a change in pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-11-02 Thread GitBox


xushiyan commented on a change in pull request #3671:
URL: https://github.com/apache/hudi/pull/3671#discussion_r741649195



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
##
@@ -164,7 +164,7 @@ public synchronized void runBeforeEach() {
   SparkConf sparkConf = conf();
   SparkRDDWriteClient.registerClasses(sparkConf);
   HoodieReadClient.addHoodieSupport(sparkConf);
-  spark = SparkSession.builder().config(sparkConf).getOrCreate();
+  spark = 
SparkSession.builder().config(sparkConf).enableHiveSupport().getOrCreate();

Review comment:
   instead of changing this for all tests, can you override 
`org.apache.hudi.testutils.providers.SparkProvider#conf()` in your specific 
test class to pass in the configs you need for your testcase?

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/HiveSchemaProvider.java
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hudi.utilities.schema;
+
+import org.apache.avro.Schema;
+import org.apache.hudi.AvroConversionUtils;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.catalyst.TableIdentifier;
+import org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.types.StructType;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+public class HiveSchemaProvider extends SchemaProvider {
+
+  /**
+   * Configs supported.
+   */
+  public static class Config {
+private static final String SOURCE_SCHEMA_DATABASE_PROP = 
"hoodie.deltastreamer.schemaprovider.source.schema.hive.database";
+private static final String SOURCE_SCHEMA_TABLE_PROP = 
"hoodie.deltastreamer.schemaprovider.source.schema.hive.table";
+private static final String TARGET_SCHEMA_DATABASE_PROP = 
"hoodie.deltastreamer.schemaprovider.target.schema.hive.database";
+private static final String TARGET_SCHEMA_TABLE_PROP = 
"hoodie.deltastreamer.schemaprovider.target.schema.hive.table";
+  }
+
+  private static final Logger LOG = 
LogManager.getLogger(HiveSchemaProvider.class);
+
+  private final Schema sourceSchema;
+
+  private Schema targetSchema;
+
+  public HiveSchemaProvider(TypedProperties props, JavaSparkContext jssc) {
+super(props, jssc);
+DataSourceUtils.checkRequiredProperties(props, 
Collections.singletonList(Config.SOURCE_SCHEMA_TABLE_PROP));
+String sourceSchemaDBName = 
props.getString(Config.SOURCE_SCHEMA_DATABASE_PROP, "default");
+String sourceSchemaTableName = 
props.getString(Config.SOURCE_SCHEMA_TABLE_PROP);
+SparkSession spark = 
SparkSession.builder().config(jssc.getConf()).enableHiveSupport().getOrCreate();
+try {
+  TableIdentifier sourceSchemaTable = new 
TableIdentifier(sourceSchemaTableName, scala.Option.apply(sourceSchemaDBName));
+  StructType sourceSchema = 
spark.sessionState().catalog().getTableMetadata(sourceSchemaTable).schema();
+
+  this.sourceSchema = AvroConversionUtils.convertStructTypeToAvroSchema(
+  sourceSchema,
+  sourceSchemaTableName,
+  "hoodie." + sourceSchemaDBName);
+
+  if (props.containsKey(Config.TARGET_SCHEMA_TABLE_PROP)) {
+String targetSchemaDBName = 
props.getString(Config.TARGET_SCHEMA_DATABASE_PROP, "default");
+String targetSchemaTableName = 
props.getString(Config.TARGET_SCHEMA_TABLE_PROP);
+TableIdentifier targetSchemaTable = new 
TableIdentifier(targetSchemaTableName, scala.Option.apply(targetSchemaDBName));
+StructType targetSchema = 
spark.sessionState().catalog().getTableMetadata(targetSchemaTable).schema();
+this.targetSchema = AvroConversionUtils.convertStructTypeToAvroSchema(
+targetSchema,
+targetSchemaTableName,
+"hoodie." + target

[jira] [Updated] (HUDI-2649) Kick off all the Hive query issues for 0.10.0

2021-11-02 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2649:
-
Fix Version/s: 0.10.0

> Kick off all the Hive query issues for 0.10.0
> -
>
> Key: HUDI-2649
> URL: https://issues.apache.org/jira/browse/HUDI-2649
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2649) Kick off all the Hive query issues for 0.10.0

2021-11-02 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2649:
-
Priority: Blocker  (was: Major)

> Kick off all the Hive query issues for 0.10.0
> -
>
> Key: HUDI-2649
> URL: https://issues.apache.org/jira/browse/HUDI-2649
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Priority: Blocker
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-02 Thread GitBox


danny0405 commented on a change in pull request #3912:
URL: https://github.com/apache/hudi/pull/3912#discussion_r741647600



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java
##
@@ -148,10 +148,11 @@ public AppendResult appendBlocks(List 
blocks) throws IOException
 HoodieLogFormat.LogFormatVersion currentLogFormatVersion =
 new HoodieLogFormatVersion(HoodieLogFormat.CURRENT_VERSION);
 
-FSDataOutputStream outputStream = getOutputStream();
-long startPos = outputStream.getPos();
+FSDataOutputStream originalOutputStream = getOutputStream();

Review comment:
   Can we give some tests for it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2660) Delete the view storage properties first before creation

2021-11-02 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2660.
--
Resolution: Fixed

Fixed via master branch: 7fc7e9b2bc6c5aeabd6f490376e9e0ae76e07874

> Delete the view storage properties first before creation
> 
>
> Key: HUDI-2660
> URL: https://issues.apache.org/jira/browse/HUDI-2660
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (5517d29 -> 7fc7e9b)

2021-11-02 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 5517d29  [HUDI-2674] hudi hive reader should not print read values. 
(#3910)
 add 7fc7e9b  [HUDI-2660] Delete the view storage properties first before 
creation (#3899)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java | 2 +-
 .../apache/hudi/sink/bootstrap/batch/BatchBootstrapOperator.java| 6 ++
 .../src/main/java/org/apache/hudi/util/ViewStorageProperties.java   | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)


[GitHub] [hudi] danny0405 merged pull request #3899: [HUDI-2660] Delete the view storage properties first before creation

2021-11-02 Thread GitBox


danny0405 merged pull request #3899:
URL: https://github.com/apache/hudi/pull/3899


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3912:
URL: https://github.com/apache/hudi/pull/3912#issuecomment-958682811


   
   ## CI report:
   
   * abbd66373198288c79bd9cde7b9d30c769c1dce3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-96) Use Command line options instead of positional arguments when launching spark applications from various CLI commands

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-96:

Labels: newbie pull-request-available sev:normal  (was: newbie 
pull-request-available sev:normal user-support-issues)

> Use Command line options instead of positional arguments when launching spark 
> applications from various CLI commands
> 
>
> Key: HUDI-96
> URL: https://issues.apache.org/jira/browse/HUDI-96
> Project: Apache Hudi
>  Issue Type: Task
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: newbie, pull-request-available, sev:normal
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hoodie CLI commands like compaction/rollback/repair/savepoints/parquet-import 
> relies on launching a spark application to perform their operations (look at 
> SparkMain.java). 
> SparkMain (Look at SparkMain.main()) relies on positional arguments for 
> passing  various CLI options. Instead we should define proper CLI options in 
> SparkMain and use them (using Jcommander)  to improve readability and avoid 
> accidental errors at call sites. For e.g : See 
> com.uber.hoodie.utilities.HoodieCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-96) Use Command line options instead of positional arguments when launching spark applications from various CLI commands

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-96:

Issue Type: Task  (was: Improvement)

> Use Command line options instead of positional arguments when launching spark 
> applications from various CLI commands
> 
>
> Key: HUDI-96
> URL: https://issues.apache.org/jira/browse/HUDI-96
> Project: Apache Hudi
>  Issue Type: Task
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: newbie, pull-request-available, sev:normal, 
> user-support-issues
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hoodie CLI commands like compaction/rollback/repair/savepoints/parquet-import 
> relies on launching a spark application to perform their operations (look at 
> SparkMain.java). 
> SparkMain (Look at SparkMain.main()) relies on positional arguments for 
> passing  various CLI options. Instead we should define proper CLI options in 
> SparkMain and use them (using Jcommander)  to improve readability and avoid 
> accidental errors at call sites. For e.g : See 
> com.uber.hoodie.utilities.HoodieCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-02 Thread GitBox


hudi-bot commented on pull request #3912:
URL: https://github.com/apache/hudi/pull/3912#issuecomment-958682811


   
   ## CI report:
   
   * abbd66373198288c79bd9cde7b9d30c769c1dce3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1278) Need a generic payload class which can skip late arriving data based on specific fields

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437753#comment-17437753
 ] 

Sagar Sumit commented on HUDI-1278:
---

[~shenhong] Any update on this ticket?

> Need a generic payload class which can skip late arriving data based on 
> specific fields
> ---
>
> Key: HUDI-1278
> URL: https://issues.apache.org/jira/browse/HUDI-1278
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: shenh062326
>Priority: Major
>  Labels: sev:normal, user-support-issues
> Fix For: 0.10.0
>
>
> Context : 
> [https://lists.apache.org/thread.html/rd5d805d29c2f704d8ff2729457d27bca42e890bc01fc8e5e1f1943e3%40%3Cdev.hudi.apache.org%3E]
> We need to implement a Payload class (like OverwriteWithLatestAvroPayload) 
> which will skip late arriving data.
> Notes:
>  # combineAndGetUpdateValue() would need work
>  # The ordering needs to be specified based on 1 or more fields and should be 
> configurable.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1569) Add Flink examples to QuickStartUtils and Docker demo page

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-1569:
-

Assignee: Danny Chen  (was: vinoyang)

> Add Flink examples to QuickStartUtils and Docker demo page
> --
>
> Key: HUDI-1569
> URL: https://issues.apache.org/jira/browse/HUDI-1569
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: Danny Chen
>Priority: Major
>  Labels: user-support-issues
>
> Add Flink examples to QuickStartUtils and Docker demo page. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1569) Add Flink examples to QuickStartUtils and Docker demo page

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437752#comment-17437752
 ] 

Sagar Sumit commented on HUDI-1569:
---

[~danny0405] Assigning this to you.

> Add Flink examples to QuickStartUtils and Docker demo page
> --
>
> Key: HUDI-1569
> URL: https://issues.apache.org/jira/browse/HUDI-1569
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: vinoyang
>Priority: Major
>  Labels: user-support-issues
>
> Add Flink examples to QuickStartUtils and Docker demo page. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1058) Make delete marker configurable

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437751#comment-17437751
 ] 

Sagar Sumit commented on HUDI-1058:
---

[~shenhong] Any update on this ticket? 
https://github.com/apache/hudi/pull/2311 was merged sometime back. Good time to 
pickupn this task?

> Make delete marker configurable
> ---
>
> Key: HUDI-1058
> URL: https://issues.apache.org/jira/browse/HUDI-1058
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Raymond Xu
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available, sev:normal, user-support-issues
>
> users can specify any boolean field for delete marker and 
> `_hoodie_is_deleted` remains as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1058) Make delete marker configurable

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1058:
--
Status: Open  (was: New)

> Make delete marker configurable
> ---
>
> Key: HUDI-1058
> URL: https://issues.apache.org/jira/browse/HUDI-1058
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Raymond Xu
>Assignee: shenh062326
>Priority: Major
>  Labels: pull-request-available, sev:normal, user-support-issues
>
> users can specify any boolean field for delete marker and 
> `_hoodie_is_deleted` remains as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2665) Overflow of DataOutputStream may lead to corrupted log block

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2665:
-
Labels: pull-request-available  (was: )

> Overflow of DataOutputStream may lead to corrupted log block
> 
>
> Key: HUDI-2665
> URL: https://issues.apache.org/jira/browse/HUDI-2665
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: ZiyueGuan
>Assignee: ZiyueGuan
>Priority: Minor
>  Labels: pull-request-available
>
> In HoodieLogFormatWriter, we use size() method of DataOutputStream to 
> calculate the size of log block we write. However, this method only allows 
> size no more than Integer.MAX_VALUE. When bytes we writen overflow, we will 
> get a corrupted log block as the size of header is inconsistent with the one 
> at footer
> https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] guanziyue opened a new pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-02 Thread GitBox


guanziyue opened a new pull request #3912:
URL: https://github.com/apache/hudi/pull/3912


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   https://issues.apache.org/jira/browse/HUDI-2665
   
   ## Brief change log
   
   Allow HoodieLogFormatWriter to append at most 2GB data once call to 
appendBlock. And add a check to prevent huge block written
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1081) Document AWS Hudi integration

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437750#comment-17437750
 ] 

Sagar Sumit commented on HUDI-1081:
---

[~uditme] Maybe we can add more details to this doc 
https://hudi.apache.org/docs/s3_hoodie

> Document AWS Hudi integration
> -
>
> Key: HUDI-1081
> URL: https://issues.apache.org/jira/browse/HUDI-1081
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, Usability
>Affects Versions: 0.9.0
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: documentation, user-support-issues
> Fix For: 0.10.0
>
>
> Often times AWS Hudi users seek documentation on setting up Hudi and 
> integrating Hive megastore and GLUE configurations. This has been one of the 
> popular thread in Slack. It would serve well if documented.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1022) Document examples for Spark structured streaming writing into Hudi

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437748#comment-17437748
 ] 

Sagar Sumit commented on HUDI-1022:
---

[~FelixKJose] Are you actively working on this? I can assign it to myself if 
you are not.

> Document examples for Spark structured streaming writing into Hudi
> --
>
> Key: HUDI-1022
> URL: https://issues.apache.org/jira/browse/HUDI-1022
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Usability
>Reporter: Bhavani Sudha
>Assignee: Felix Kizhakkel Jose
>Priority: Minor
>  Labels: sev:normal, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-851) Add Documentation on partitioning data with examples and details on how to sync to Hive

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-851:


Assignee: Sagar Sumit  (was: Bhavani Sudha)

> Add Documentation on partitioning data with examples and details on how to 
> sync to Hive
> ---
>
> Key: HUDI-851
> URL: https://issues.apache.org/jira/browse/HUDI-851
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, docs-chinese
>Reporter: Bhavani Sudha
>Assignee: Sagar Sumit
>Priority: Minor
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-899911684


   
   ## CI report:
   
   * d2b00796c9564088aa8533431c73251993f688d4 UNKNOWN
   * 99853468aec1becd1112c0ffba6ccf5f604e713d UNKNOWN
   * 210aa90b7cedc691b11d7e146a94ab199874ae50 UNKNOWN
   * 0876ffb9762eda7a914e0ac8978284726cc0b267 UNKNOWN
   * 248651db6b258d501cc0c5752a9b888137bff669 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3093)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3888: [HUDI-2624] Implement Non Index type for HUDI

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3888:
URL: https://github.com/apache/hudi/pull/3888#issuecomment-954503596


   
   ## CI report:
   
   * f20758076b7fd9355a6b3075fc03d93982b80cc9 UNKNOWN
   * 4efd2f4b2a47c7417aa6dc84ef40162637448fdf UNKNOWN
   * 49a813a067fccf3bad68648b7b0164d7d36a5947 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3092)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-489) Add hudi DataSource API example to hudi-examples

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437746#comment-17437746
 ] 

Sagar Sumit commented on HUDI-489:
--

Closing it. There is a datasource example now: 
https://github.com/apache/hudi/blob/master/hudi-examples/src/main/scala/org/apache/hudi/examples/spark/HoodieDataSourceExample.scala

> Add hudi DataSource API example to hudi-examples
> 
>
> Key: HUDI-489
> URL: https://issues.apache.org/jira/browse/HUDI-489
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: dengziming
>Assignee: dengziming
>Priority: Minor
>  Labels: starter, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-489) Add hudi DataSource API example to hudi-examples

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-489:
-
Status: Open  (was: New)

> Add hudi DataSource API example to hudi-examples
> 
>
> Key: HUDI-489
> URL: https://issues.apache.org/jira/browse/HUDI-489
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: dengziming
>Assignee: dengziming
>Priority: Minor
>  Labels: starter, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-489) Add hudi DataSource API example to hudi-examples

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-489.

Resolution: Fixed

> Add hudi DataSource API example to hudi-examples
> 
>
> Key: HUDI-489
> URL: https://issues.apache.org/jira/browse/HUDI-489
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: dengziming
>Assignee: dengziming
>Priority: Minor
>  Labels: starter, user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-396) Provide an documentation to describe how to use test suite

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-396.

  Assignee: Xianghu Wang  (was: wangxianghu#1)
Resolution: Fixed

> Provide an documentation to describe how to use test suite
> --
>
> Key: HUDI-396
> URL: https://issues.apache.org/jira/browse/HUDI-396
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-396) Provide an documentation to describe how to use test suite

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437745#comment-17437745
 ] 

Sagar Sumit commented on HUDI-396:
--

Closing this. The 
[readme|https://github.com/apache/hudi/tree/master/hudi-integ-test] has been 
updated with steps.

> Provide an documentation to describe how to use test suite
> --
>
> Key: HUDI-396
> URL: https://issues.apache.org/jira/browse/HUDI-396
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: wangxianghu#1
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-226) Hudi Website - Provide links to documentation corresponding to older release versions

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437744#comment-17437744
 ] 

Sagar Sumit commented on HUDI-226:
--

Closing it. With the site revamp, we have the capability to switch betweens 
docs of different versions.

> Hudi Website - Provide links to documentation corresponding to older release 
> versions
> -
>
> Key: HUDI-226
> URL: https://issues.apache.org/jira/browse/HUDI-226
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Docs, docs-chinese, newbie
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Major
>  Labels: user-support-issues
>
> While this may be too difficult to do it retroactively for previous versions, 
> we need to support this for apache releases. 
> See flink website (e:g - [https://flink.apache.org/] you will see a link 1.9 
> version  [https://ci.apache.org/projects/flink/flink-docs-release-1.9/]
> For older releases, 0.4.6 and 0.4.7, we have created git tags 
> *hoodie-site-0.4.6 and*  *hoodie-site-0.4.7* 
> *You can checkout the tags and read README.md to access and run website 
> locally.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-226) Hudi Website - Provide links to documentation corresponding to older release versions

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-226.

Resolution: Fixed

> Hudi Website - Provide links to documentation corresponding to older release 
> versions
> -
>
> Key: HUDI-226
> URL: https://issues.apache.org/jira/browse/HUDI-226
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Docs, docs-chinese, newbie
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Major
>  Labels: user-support-issues
>
> While this may be too difficult to do it retroactively for previous versions, 
> we need to support this for apache releases. 
> See flink website (e:g - [https://flink.apache.org/] you will see a link 1.9 
> version  [https://ci.apache.org/projects/flink/flink-docs-release-1.9/]
> For older releases, 0.4.6 and 0.4.7, we have created git tags 
> *hoodie-site-0.4.6 and*  *hoodie-site-0.4.7* 
> *You can checkout the tags and read README.md to access and run website 
> locally.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1120) Support spotless for scala

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1120:
--
Status: Patch Available  (was: In Progress)

> Support spotless for scala
> --
>
> Key: HUDI-1120
> URL: https://issues.apache.org/jira/browse/HUDI-1120
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Major
>  Labels: pull-request-available, sev:normal, user-support-issues
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1176) Support log4j2 config

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1176:
--
Status: In Progress  (was: Open)

> Support log4j2 config
> -
>
> Key: HUDI-1176
> URL: https://issues.apache.org/jira/browse/HUDI-1176
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available, user-support-issues
>
> Now in some modules(like cli, client) use log4j2, and it cannot correct load 
> config file (ERROR StatusLogger No log4j2 configuration file found. Using 
> default configuration: logging only errors to the console.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1176) Support log4j2 config

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1176:
--
Status: Patch Available  (was: In Progress)

> Support log4j2 config
> -
>
> Key: HUDI-1176
> URL: https://issues.apache.org/jira/browse/HUDI-1176
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available, user-support-issues
>
> Now in some modules(like cli, client) use log4j2, and it cannot correct load 
> config file (ERROR StatusLogger No log4j2 configuration file found. Using 
> default configuration: logging only errors to the console.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1176) Support log4j2 config

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1176:
--
Status: Open  (was: New)

> Support log4j2 config
> -
>
> Key: HUDI-1176
> URL: https://issues.apache.org/jira/browse/HUDI-1176
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available, user-support-issues
>
> Now in some modules(like cli, client) use log4j2, and it cannot correct load 
> config file (ERROR StatusLogger No log4j2 configuration file found. Using 
> default configuration: logging only errors to the console.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yuzhaojing commented on pull request #3888: [HUDI-2624] Implement Non Index type for HUDI

2021-11-02 Thread GitBox


yuzhaojing commented on pull request #3888:
URL: https://github.com/apache/hudi/pull/3888#issuecomment-958673059


   > @yuzhaojing check CI?
   
   CI is all successed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2616) Implement BloomIndex for Dataset

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2616:
-
Priority: Critical  (was: Blocker)

> Implement BloomIndex for Dataset
> -
>
> Key: HUDI-2616
> URL: https://issues.apache.org/jira/browse/HUDI-2616
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2616) Implement BloomIndex for Dataset

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2616:
-
Fix Version/s: (was: 0.10.0)

> Implement BloomIndex for Dataset
> -
>
> Key: HUDI-2616
> URL: https://issues.apache.org/jira/browse/HUDI-2616
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2531:
-
Fix Version/s: (was: 0.10.0)

> [UMBRELLA] Support Dataset APIs in writer paths
> ---
>
> Key: HUDI-2531
> URL: https://issues.apache.org/jira/browse/HUDI-2531
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: hudi-umbrellas
>
> To make use of Dataset APIs in writer paths instead of RDD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2620) Benchmark SparkDataFrameWriteClient

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2620:
-
Fix Version/s: (was: 0.10.0)

> Benchmark SparkDataFrameWriteClient
> ---
>
> Key: HUDI-2620
> URL: https://issues.apache.org/jira/browse/HUDI-2620
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2620) Benchmark SparkDataFrameWriteClient

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2620:
-
Priority: Major  (was: Blocker)

> Benchmark SparkDataFrameWriteClient
> ---
>
> Key: HUDI-2620
> URL: https://issues.apache.org/jira/browse/HUDI-2620
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2665) Overflow of DataOutputStream may lead to corrupted log block

2021-11-02 Thread ZiyueGuan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZiyueGuan updated HUDI-2665:

Priority: Minor  (was: Major)

> Overflow of DataOutputStream may lead to corrupted log block
> 
>
> Key: HUDI-2665
> URL: https://issues.apache.org/jira/browse/HUDI-2665
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: ZiyueGuan
>Assignee: ZiyueGuan
>Priority: Minor
>
> In HoodieLogFormatWriter, we use size() method of DataOutputStream to 
> calculate the size of log block we write. However, this method only allows 
> size no more than Integer.MAX_VALUE. When bytes we writen overflow, we will 
> get a corrupted log block as the size of header is inconsistent with the one 
> at footer
> https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1430:
-
Fix Version/s: (was: 0.10.0)

> Implement SparkDataFrameWriteClient with SimpleIndex
> 
>
> Key: HUDI-1430
> URL: https://issues.apache.org/jira/browse/HUDI-1430
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
>
> End to end upsert operation, with proper functional tests coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2615) Decouple HoodieRecordPayload with Hoodie table, table services, and index

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2615:
-
Fix Version/s: (was: 0.10.0)

> Decouple HoodieRecordPayload with Hoodie table, table services, and index
> -
>
> Key: HUDI-2615
> URL: https://issues.apache.org/jira/browse/HUDI-2615
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>
> HoodieTable, HoodieIndex, and compaction, clustering services should be 
> independent of HoodieRecordPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2615) Decouple HoodieRecordPayload with Hoodie table, table services, and index

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2615:
-
Priority: Major  (was: Blocker)

> Decouple HoodieRecordPayload with Hoodie table, table services, and index
> -
>
> Key: HUDI-2615
> URL: https://issues.apache.org/jira/browse/HUDI-2615
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.10.0
>
>
> HoodieTable, HoodieIndex, and compaction, clustering services should be 
> independent of HoodieRecordPayload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1430:
-
Priority: Major  (was: Blocker)

> Implement SparkDataFrameWriteClient with SimpleIndex
> 
>
> Key: HUDI-1430
> URL: https://issues.apache.org/jira/browse/HUDI-1430
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> End to end upsert operation, with proper functional tests coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2621) Enhance DataFrameWriter with small file handling

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2621:
-
Priority: Major  (was: Blocker)

> Enhance DataFrameWriter with small file handling
> 
>
> Key: HUDI-2621
> URL: https://issues.apache.org/jira/browse/HUDI-2621
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2617) Implement HBase Index for Dataset

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2617:
-
Priority: Major  (was: Blocker)

> Implement HBase Index for Dataset
> --
>
> Key: HUDI-2617
> URL: https://issues.apache.org/jira/browse/HUDI-2617
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2618) Implement write operations other than upsert in SparkDataFrameWriteClient

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2618:
-
Priority: Major  (was: Blocker)

> Implement write operations other than upsert in SparkDataFrameWriteClient
> -
>
> Key: HUDI-2618
> URL: https://issues.apache.org/jira/browse/HUDI-2618
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
>
> insert, insert_prepped, insert_overwrite, insert_overwrite_table, delete, 
> delete_partitions, bulk_insert



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2617) Implement HBase Index for Dataset

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2617:
-
Fix Version/s: (was: 0.10.0)

> Implement HBase Index for Dataset
> --
>
> Key: HUDI-2617
> URL: https://issues.apache.org/jira/browse/HUDI-2617
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2621) Enhance DataFrameWriter with small file handling

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2621:
-
Fix Version/s: (was: 0.10.0)

> Enhance DataFrameWriter with small file handling
> 
>
> Key: HUDI-2621
> URL: https://issues.apache.org/jira/browse/HUDI-2621
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2622) Enhance DataFrameWriter with LazyIterator and SpillableMap

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2622:
-
Fix Version/s: (was: 0.10.0)

> Enhance DataFrameWriter with LazyIterator and SpillableMap
> --
>
> Key: HUDI-2622
> URL: https://issues.apache.org/jira/browse/HUDI-2622
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2618) Implement write operations other than upsert in SparkDataFrameWriteClient

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2618:
-
Fix Version/s: (was: 0.10.0)

> Implement write operations other than upsert in SparkDataFrameWriteClient
> -
>
> Key: HUDI-2618
> URL: https://issues.apache.org/jira/browse/HUDI-2618
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Blocker
>
> insert, insert_prepped, insert_overwrite, insert_overwrite_table, delete, 
> delete_partitions, bulk_insert



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2622) Enhance DataFrameWriter with LazyIterator and SpillableMap

2021-11-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2622:
-
Priority: Major  (was: Blocker)

> Enhance DataFrameWriter with LazyIterator and SpillableMap
> --
>
> Key: HUDI-2622
> URL: https://issues.apache.org/jira/browse/HUDI-2622
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-233) Redo log statements using SLF4J

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit resolved HUDI-233.
--
Resolution: Fixed

> Redo log statements using SLF4J 
> 
>
> Key: HUDI-233
> URL: https://issues.apache.org/jira/browse/HUDI-233
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Code Cleanup, newbie
>Affects Versions: 0.5.0
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently we are not employing variable substitution aggresively in the 
> project.  ala 
> {code:java}
> LogManager.getLogger(SomeName.class.getName()).info("Message: {}, Detail: 
> {}", message, detail);
> {code}
> This can improve performance since the string concatenation is deferrable to 
> when the logging is actually in effect.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1872) Move HoodieFlinkStreamer into hudi-utilities module

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1872:
--
Status: Patch Available  (was: In Progress)

> Move HoodieFlinkStreamer into hudi-utilities module
> ---
>
> Key: HUDI-1872
> URL: https://issues.apache.org/jira/browse/HUDI-1872
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Vinay
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1528) hudi-sync-tools error

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437738#comment-17437738
 ] 

Sagar Sumit commented on HUDI-1528:
---

It's working now. I have shared code snippet here 
https://github.com/apache/hudi/issues/2439#issuecomment-930059601 
Closing this issue.

> hudi-sync-tools error
> -
>
> Key: HUDI-1528
> URL: https://issues.apache.org/jira/browse/HUDI-1528
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Trevorzhang
>Assignee: Trevorzhang
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.10.0
>
>
> When using hudi-sync-tools to synchronize to a remote hive, hivemetastore 
> throw exceptions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1528) hudi-sync-tools error

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-1528.
-
Resolution: Fixed

> hudi-sync-tools error
> -
>
> Key: HUDI-1528
> URL: https://issues.apache.org/jira/browse/HUDI-1528
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Trevorzhang
>Assignee: Trevorzhang
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.10.0
>
>
> When using hudi-sync-tools to synchronize to a remote hive, hivemetastore 
> throw exceptions.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan commented on pull request #3053: [HUDI-1932] Update Hive sync timestamp when change detected

2021-11-02 Thread GitBox


xushiyan commented on pull request #3053:
URL: https://github.com/apache/hudi/pull/3053#issuecomment-958666489


   @zuyanton yes we're planning it for 0.10 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2663) Incorrect deletion of heartbeat files for inflight commits

2021-11-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2663:
-
Priority: Blocker  (was: Critical)

> Incorrect deletion of heartbeat files for inflight commits
> --
>
> Key: HUDI-2663
> URL: https://issues.apache.org/jira/browse/HUDI-2663
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.10.0
>
>
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java#L818
> AbstractHoodieWriteClient.java
>   
> HeartbeatUtils.cleanExpiredHeartbeats(this.heartbeatClient.getAllExistingHeartbeatInstants(),
> apache/hudi | Added by GitHub
> 5:43
> This method just blindly deletes all heartbeat files for all inflight commits
> 5:43
> this causes other commits to see it as missing heartbeats
> 5:43
> causing to rollback other inflight commits



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2663) Incorrect deletion of heartbeat files for inflight commits

2021-11-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2663:
-
Story Points: 10

> Incorrect deletion of heartbeat files for inflight commits
> --
>
> Key: HUDI-2663
> URL: https://issues.apache.org/jira/browse/HUDI-2663
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.10.0
>
>
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java#L818
> AbstractHoodieWriteClient.java
>   
> HeartbeatUtils.cleanExpiredHeartbeats(this.heartbeatClient.getAllExistingHeartbeatInstants(),
> apache/hudi | Added by GitHub
> 5:43
> This method just blindly deletes all heartbeat files for all inflight commits
> 5:43
> this causes other commits to see it as missing heartbeats
> 5:43
> causing to rollback other inflight commits



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2663) Incorrect deletion of heartbeat files for inflight commits

2021-11-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-2663:


Assignee: Vinoth Chandar

> Incorrect deletion of heartbeat files for inflight commits
> --
>
> Key: HUDI-2663
> URL: https://issues.apache.org/jira/browse/HUDI-2663
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.10.0
>
>
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java#L818
> AbstractHoodieWriteClient.java
>   
> HeartbeatUtils.cleanExpiredHeartbeats(this.heartbeatClient.getAllExistingHeartbeatInstants(),
> apache/hudi | Added by GitHub
> 5:43
> This method just blindly deletes all heartbeat files for all inflight commits
> 5:43
> this causes other commits to see it as missing heartbeats
> 5:43
> causing to rollback other inflight commits



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3911: [HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro…

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3911:
URL: https://github.com/apache/hudi/pull/3911#issuecomment-958636680


   
   ## CI report:
   
   * 90b58a3afad964af9d252a3633b555a21253df7d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3091)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1475) Fix documentation of preCombine to clarify when this API is used by Hudi

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1475:
--
Status: Closed  (was: Patch Available)

> Fix documentation of preCombine to clarify when this API is used by Hudi 
> -
>
> Key: HUDI-1475
> URL: https://issues.apache.org/jira/browse/HUDI-1475
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.10.0
>
>
> We need to fix the Javadoc of preCombine in HoodieRecordPayload to clarify 
> that this method is used to pre-merge  unmerged (compaction) and incoming 
> records before the merge with existing record in the dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-718) java.lang.ClassCastException during upsert

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-718:
-
Status: In Progress  (was: Open)

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.10.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-718) java.lang.ClassCastException during upsert

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-718:
-
Status: Closed  (was: Patch Available)

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.10.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-718) java.lang.ClassCastException during upsert

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-718:
-
Status: Patch Available  (was: In Progress)

> java.lang.ClassCastException during upsert
> --
>
> Key: HUDI-718
> URL: https://issues.apache.org/jira/browse/HUDI-718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.10.0
>
> Attachments: image-2020-03-21-16-49-28-905.png
>
>
> Dataset was created using hudi 0.5 and now trying to migrate it to the latest 
> master. The table is written using SqlTransformer. Exception:
>  
> Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge 
> old record into new file for key bla.bla from old file 
> gs://../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_0-35-1196_20200316234140.parquet
>  to new file 
> gs://.../2020/03/15/7b75931f-ff2f-4bf4-8949-5c437112be79-0_1-39-1506_20200317190948.parquet
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:246)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:433)
>  at 
> org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:423)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  ... 3 more
> Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be 
> cast to org.apache.avro.generic.GenericFixed
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:336)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:275)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:103)
>  at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:242)
>  ... 8 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-899911684


   
   ## CI report:
   
   * d2b00796c9564088aa8533431c73251993f688d4 UNKNOWN
   * 99853468aec1becd1112c0ffba6ccf5f604e713d UNKNOWN
   * 093275425688b2572da5e857899fecbc0c718cf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3057)
 
   * 210aa90b7cedc691b11d7e146a94ab199874ae50 UNKNOWN
   * 0876ffb9762eda7a914e0ac8978284726cc0b267 UNKNOWN
   * 248651db6b258d501cc0c5752a9b888137bff669 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3093)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-943) Slow performance observed when inserting data into Hudi table

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437732#comment-17437732
 ] 

Sagar Sumit commented on HUDI-943:
--

[~h117561964] [~vbalaji] Is thi still an issue? HoodieSparkSqlWriter has 
changed significantly since this issue was reported.

> Slow performance observed when inserting data into Hudi table
> -
>
> Key: HUDI-943
> URL: https://issues.apache.org/jira/browse/HUDI-943
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Performance
>Reporter: Sam Huang
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: performance, sev:critical, user-support-issues
>
> I am using Datasource Writer API to write 5000 records into Hudi 
> copy-on-write table, each with 8 columns and the total size is less than 1Mb. 
> Please refer to the below code.
>  
> {code:java}
> Dataset ds1 = spark.read().json(jsc.parallelize(records, 2)); 
> DataFrameWriter writer = ds1.write().format("org.apache.hudi") 
> .option("hoodie.insert.shuffle.parallelism", 2) 
> .option("hoodie.upsert.shuffle.parallelism", 2) 
> .option(DataSourceWriteOptions.OPERATION_OPT_KEY(), 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL()) 
> .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), tableType) 
> .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), recordKey) 
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), partitionPath) 
> .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), precombineKey) 
> .option(HoodieWriteConfig.TABLE_NAME, hudiTableName) 
> .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, hudiWriteCompactEnabled) 
> .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), hudiTableName) 
> .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), hiveDatabase) 
> .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), hiveServerUrl) 
> .option(DataSourceWriteOptions.HIVE_USER_OPT_KEY(), hiveUser) 
> .option(DataSourceWriteOptions.HIVE_PASS_OPT_KEY(), hivePassword) 
> .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), hiveSyncEnabled) 
> .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), 
> partitionPath)    .mode(SaveMode.Append); 
> writer.save(basePath);
> {code}
>  
> At the beginning, it only takes 3~4 seconds to finish the insert operation, 
> but it gets longer and longer, say 30 seconds after 5 minutes.
> From the below spark logs, it shows that most of time is spent on 
> HoodieSparkSqlWriter count task.
>  
> {noformat}
> 2020-05-25 16:36:37,851 | INFO  | [dag-scheduler-event-loop] | Adding task 
> set 185.0 with 1 tasks | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:36:37,851 | INFO  | [dispatcher-event-loop-0] | Starting task 
> 0.0 in stage 185.0 (TID 190, node-ana-corepOlf, executor 2, partition 0, 
> NODE_LOCAL, 7651 bytes) | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:36:37,858 | INFO  | [dispatcher-event-loop-1] | Added 
> broadcast_124_piece0 in memory on node-ana-corepOlf:36554 (size: 138.1 KB, 
> free: 29.2 GB) | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:36:37,887 | INFO  | [dispatcher-event-loop-1] | Asked to send 
> map output locations for shuffle 53 to 10.155.114.97:32461 | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:37:11,098 | INFO  | [dispatcher-event-loop-0] | Added rdd_381_0 
> in memory on node-ana-corepOlf:36554 (size: 387.0 B, free: 29.2 GB) | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:37:11,111 | INFO  | [task-result-getter-2] | Finished task 0.0 
> in stage 185.0 (TID 190) in 33260 ms on node-ana-corepOlf (executor 2) (1/1) 
> | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:37:11,111 | INFO  | [task-result-getter-2] | Removed TaskSet 
> 185.0, whose tasks have all completed, from pool  | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:37:11,112 | INFO  | [dag-scheduler-event-loop] | ResultStage 
> 185 (count at HoodieSparkSqlWriter.scala:254) finished in 33.308 s | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> 2020-05-25 16:37:11,113 | INFO  | [Driver] | Job 70 finished: count at 
> HoodieSparkSqlWriter.scala:254, took 33.438673 s | 
> org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
> {noformat}
>  
> I tried to tune the parameter hoodie.insert.shuffle.parallelism to 20, but 
> did not help. And the CPU/Heap usages are all normal.
>  
> Below is the setting for application.
> Executor instance: 2
> Executor memory: 55g
> Executor cores: 4
> Driver memory: 4g
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * a3677e66a1fb13c1a91d6beb977b00ddfdd6a51e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2151) Make performant out-of-box configs

2021-11-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-2151:


Assignee: Raymond Xu  (was: sivabalan narayanan)

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3899: [HUDI-2660] Delete the view storage properties first before creation

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3899:
URL: https://github.com/apache/hudi/pull/3899#issuecomment-956165515


   
   ## CI report:
   
   * c30db533861087c73d6d71e68cc6fdc00985803b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1609) Issues w/ using hive metastore by disabling jdbc

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437728#comment-17437728
 ] 

Sagar Sumit commented on HUDI-1609:
---

It should have been fixed now. I'll verify it.

> Issues w/ using hive metastore by disabling jdbc
> 
>
> Key: HUDI-1609
> URL: https://issues.apache.org/jira/browse/HUDI-1609
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> Ref: https://github.com/apache/hudi/issues/1679



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1609) Issues w/ using hive metastore by disabling jdbc

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1609:
--
Status: In Progress  (was: Open)

> Issues w/ using hive metastore by disabling jdbc
> 
>
> Key: HUDI-1609
> URL: https://issues.apache.org/jira/browse/HUDI-1609
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> Ref: https://github.com/apache/hudi/issues/1679



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2480) FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-2480:
--
Status: Patch Available  (was: In Progress)

> FileSlice after pending compaction-requested instant-time is ignored by MOR 
> snapshot reader
> ---
>
> Key: HUDI-2480
> URL: https://issues.apache.org/jira/browse/HUDI-2480
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration, Spark Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2480) FileSlice after pending compaction-requested instant-time is ignored by MOR snapshot reader

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-2480:
--
Status: In Progress  (was: Open)

> FileSlice after pending compaction-requested instant-time is ignored by MOR 
> snapshot reader
> ---
>
> Key: HUDI-2480
> URL: https://issues.apache.org/jira/browse/HUDI-2480
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration, Spark Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2493) Verify removing glob pattern works w/ all key generators

2021-11-02 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437727#comment-17437727
 ] 

Sagar Sumit commented on HUDI-2493:
---

[~rxu][~shivnarayan]Can we close this in favour of 
[HUDI-2590|https://issues.apache.org/jira/browse/HUDI-2590]?

> Verify removing glob pattern works w/ all key generators
> 
>
> Key: HUDI-2493
> URL: https://issues.apache.org/jira/browse/HUDI-2493
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: sivabalan narayanan
>Assignee: Raymond Xu
>Priority: Major
>  Labels: sev:critical
>
> In the last release we added support to remove glob pattern. i.e. 
> while reading hudi dataset, 
> {code:java}
> spark.read.format("hudi").load(basePath+"/*/*") 
> {code}
> -> 
> {code:java}
> spark.read.format("hudi").load(basePath)
> {code}
> Suffixing with
> {code:java}
> "/*/*"
> {code}
>  is not required anymore. 
> But we need to verify if the same works for all key generators before we can 
> announce that in general this can be used. Or else we have to call out for 
> what key gens this works. and put in a fix for those which does not work.
>  
> For eg:
> I tried removing glob pattern from few key generator tests in 
> TestCOWDataSource and it failed. 
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala#L413]
>  
> ones which worked after removing glob pattern: 
> SimpleKeyGenerator, ComplexKeyGenerator, GlobalDeleteKeyGenerator, 
> NonpartitionedKeyGenerator
> Ones which did not work:
> CustomKeyGenerator, TimestampBasedKeyGenerator
>  
> You can try it locally by removing the glob pattern for these tests.
> stacktrace for timestamp based key gen
> {code:java}
> 0    [main] WARN  org.apache.spark.util.Utils  - Your hostname, 
> Sivabalans-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 
> 10.0.0.202 instead (on interface en0)0    [main] WARN  
> org.apache.spark.util.Utils  - Your hostname, Sivabalans-MacBook-Pro.local 
> resolves to a loopback address: 127.0.0.1; using 10.0.0.202 instead (on 
> interface en0)1    [main] WARN  org.apache.spark.util.Utils  - Set 
> SPARK_LOCAL_IP if you need to bind to another address390  [main] WARN  
> org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable11317 
> [main] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata 
> table was not found at path 
> file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit1114662825639168138/dataset/.hoodie/metadata11515
>  [main] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata 
> table was not found at path 
> file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit1114662825639168138/dataset/.hoodie/metadata11840
>  [main] WARN  org.apache.spark.util.Utils  - Truncated the string 
> representation of a plan since it was too large. This behavior can be 
> adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.12319 
> [main] WARN  org.apache.hudi.testutils.HoodieClientTestHarness  - Closing 
> file-system instance used in previous test-run
> org.opentest4j.AssertionFailedError: Expected :trueActual   :false see difference> at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at 
> org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:40) at 
> org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:35) at 
> org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:162) at 
> org.apache.hudi.functional.TestCOWDataSource.testSparkPartitonByWithTimestampBasedKeyGenerator(TestCOWDataSource.scala:517)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>  at 
> org.junit.jupiter.engine.execution.Executabl

[GitHub] [hudi] vinothchandar merged pull request #3907: [HUDI-2670] - relative links broken in docs

2021-11-02 Thread GitBox


vinothchandar merged pull request #3907:
URL: https://github.com/apache/hudi/pull/3907


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [HUDI-2670] - relative links broken in docs (#3907)

2021-11-02 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new aee584d  [HUDI-2670] - relative links broken in docs (#3907)
aee584d is described below

commit aee584dc6ee4d2691225518954aad03af68eb7ff
Author: Kyle Weller 
AuthorDate: Tue Nov 2 21:49:09 2021 -0700

[HUDI-2670] - relative links broken in docs (#3907)


* added new docs to current version to fix broken relative links
---
 .../version-0.9.0/hoodie_deltastreamer.md  | 211 +
 .../version-0.9.0/query_engine_setup.md|  46 +
 .../versioned_docs/version-0.9.0/table_types.md|   7 +
 3 files changed, 264 insertions(+)

diff --git a/website/versioned_docs/version-0.9.0/hoodie_deltastreamer.md 
b/website/versioned_docs/version-0.9.0/hoodie_deltastreamer.md
new file mode 100644
index 000..a97f1cb
--- /dev/null
+++ b/website/versioned_docs/version-0.9.0/hoodie_deltastreamer.md
@@ -0,0 +1,211 @@
+---
+title: Streaming Ingestion
+keywords: [hudi, deltastreamer, hoodiedeltastreamer]
+---
+
+## DeltaStreamer
+
+The `HoodieDeltaStreamer` utility (part of hudi-utilities-bundle) provides the 
way to ingest from different sources such as DFS or Kafka, with the following 
capabilities.
+
+- Exactly once ingestion of new events from Kafka, [incremental 
imports](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 from Sqoop or output of `HiveIncrementalPuller` or files under a DFS folder
+- Support json, avro or a custom record types for the incoming data
+- Manage checkpoints, rollback & recovery
+- Leverage Avro schemas from DFS or Confluent [schema 
registry](https://github.com/confluentinc/schema-registry).
+- Support for plugging in transformations
+
+Command line options describe capabilities in more detail
+
+```java
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` --help
+Usage:  [options]
+Options:
+--checkpoint
+  Resume Delta Streamer from this checkpoint.
+--commit-on-errors
+  Commit even when some records failed to be written
+  Default: false
+--compact-scheduling-minshare
+  Minshare for compaction as defined in
+  https://spark.apache.org/docs/latest/job-scheduling
+  Default: 0
+--compact-scheduling-weight
+  Scheduling weight for compaction as defined in
+  https://spark.apache.org/docs/latest/job-scheduling
+  Default: 1
+--continuous
+  Delta Streamer runs in continuous mode running source-fetch -> Transform
+  -> Hudi Write in loop
+  Default: false
+--delta-sync-scheduling-minshare
+  Minshare for delta sync as defined in
+  https://spark.apache.org/docs/latest/job-scheduling
+  Default: 0
+--delta-sync-scheduling-weight
+  Scheduling weight for delta sync as defined in
+  https://spark.apache.org/docs/latest/job-scheduling
+  Default: 1
+--disable-compaction
+  Compaction is enabled for MoR table by default. This flag disables it
+  Default: false
+--enable-hive-sync
+  Enable syncing to hive
+  Default: false
+--filter-dupes
+  Should duplicate records from source be dropped/filtered out before
+  insert/bulk-insert
+  Default: false
+--help, -h
+
+--hoodie-conf
+  Any configuration that can be set in the properties file (using the CLI
+  parameter "--propsFilePath") can also be passed command line using this
+  parameter
+  Default: []
+--max-pending-compactions
+  Maximum number of outstanding inflight/requested compactions. Delta Sync
+  will not happen unlessoutstanding compactions is less than this number
+  Default: 5
+--min-sync-interval-seconds
+  the min sync interval of each sync in continuous mode
+  Default: 0
+--op
+  Takes one of these values : UPSERT (default), INSERT (use when input is
+  purely new data/inserts to gain speed)
+  Default: UPSERT
+  Possible Values: [UPSERT, INSERT, BULK_INSERT]
+--payload-class
+  subclass of HoodieRecordPayload, that works off a GenericRecord.
+  Implement your own, if you want to do something other than overwriting
+  existing value
+  Default: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
+--props
+  path to properties file on localfs or dfs, with configurations for
+  hoodie client, schema provider, key generator and data source. For
+  hoodie client props, sane defaults are used, but recommend use to
+  provide basic things like metrics endpoints, hive configs etc. For
+  sources, referto individual classes, for supported properties.
+  Default: 
file:///Users/vinoth/bin/hoodie/src/test/resources/delta-streamer-config/dfs-source.

[GitHub] [hudi] vinothchandar commented on pull request #3907: [HUDI-2670] - relative links broken in docs

2021-11-02 Thread GitBox


vinothchandar commented on pull request #3907:
URL: https://github.com/apache/hudi/pull/3907#issuecomment-958653666


   @kywe665 there are 24 commits in this PR, even though there are only 3 files 
changed? For every PR, you can use a new branch and rebase that to asf-site 
prior? it can avoid this issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2670) Fix broken relative links

2021-11-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2670:
-
Labels: pull-request-available  (was: )

> Fix broken relative links
> -
>
> Key: HUDI-2670
> URL: https://issues.apache.org/jira/browse/HUDI-2670
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Minor
>  Labels: pull-request-available
>
> A few relative links were broken in last PR since new docs generated were not 
> available in 0.9.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #3907: [HUDI-2670] - relative links broken in docs

2021-11-02 Thread GitBox


vinothchandar commented on pull request #3907:
URL: https://github.com/apache/hudi/pull/3907#issuecomment-958653341


   Seems to build locally. landing to make asf-site green again. 
   
   probably need a better soln? cc @vingov 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2509) OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-2509:
--
Status: Patch Available  (was: In Progress)

> OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with 
> some null value column
> ---
>
> Key: HUDI-2509
> URL: https://issues.apache.org/jira/browse/HUDI-2509
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Adam Z CHEN
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
>
> https://github.com/apache/hudi/issues/3735



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2509) OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-2509:
--
Status: In Progress  (was: Open)

> OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with 
> some null value column
> ---
>
> Key: HUDI-2509
> URL: https://issues.apache.org/jira/browse/HUDI-2509
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Adam Z CHEN
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
>
> https://github.com/apache/hudi/issues/3735



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1976:
--
Status: Patch Available  (was: In Progress)

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
>  Labels: pull-request-available, sev:high
> Fix For: 0.10.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1975) Upgrade java-prometheus-client from 3.1.2 to 4.x

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1975:
--
Status: Patch Available  (was: In Progress)

> Upgrade java-prometheus-client from 3.1.2 to 4.x
> 
>
> Key: HUDI-1975
> URL: https://issues.apache.org/jira/browse/HUDI-1975
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Blocker
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> Find more details here -> https://github.com/apache/hudi/issues/2774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1864) Support for java.time.LocalDate in TimestampBasedAvroKeyGenerator

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-1864:
--
Status: Patch Available  (was: In Progress)

> Support for java.time.LocalDate in TimestampBasedAvroKeyGenerator
> -
>
> Key: HUDI-1864
> URL: https://issues.apache.org/jira/browse/HUDI-1864
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Vaibhav Sinha
>Assignee: Vaibhav Sinha
>Priority: Major
>  Labels: pull-request-available, sev:high
> Fix For: 0.10.0
>
>
> When we read data from MySQL which has a column of type {{Date}}, Spark 
> represents it as an instance of {{java.time.LocalDate}}. If I try and use 
> this column for partitioning while doing a write to Hudi, I get the following 
> exception
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieKeyGeneratorException: Unable to 
> parse input partition field :2021-04-21
>   at 
> org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator.getPartitionPath(TimestampBasedAvroKeyGenerator.java:136)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.CustomAvroKeyGenerator.getPartitionPath(CustomAvroKeyGenerator.java:89)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.CustomKeyGenerator.getPartitionPath(CustomKeyGenerator.java:64)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:62) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$2(HoodieSparkSqlWriter.scala:160)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator$SliceIterator.next(Iterator.scala:271) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator.foreach(Iterator.scala:941) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.Iterator.foreach$(Iterator.scala:941) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.to(TraversableOnce.scala:315) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.to(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307) 
> ~[scala-library-2.12.10.jar:?]
>   at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288) 
> ~[scala-library-2.12.10.jar:?]
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1429) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1449) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.scheduler.Task.run(Task.scala:131) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
>  ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) 
> ~[spark-core_2.12-3.1.1.jar:3.1.1]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_171]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_171]
>   at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_171]
> Caused by: org.apache.h

[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-11-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-83:

Status: Patch Available  (was: In Progress)

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-899911684


   
   ## CI report:
   
   * d2b00796c9564088aa8533431c73251993f688d4 UNKNOWN
   * 99853468aec1becd1112c0ffba6ccf5f604e713d UNKNOWN
   * 093275425688b2572da5e857899fecbc0c718cf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3057)
 
   * 210aa90b7cedc691b11d7e146a94ab199874ae50 UNKNOWN
   * 0876ffb9762eda7a914e0ac8978284726cc0b267 UNKNOWN
   * 248651db6b258d501cc0c5752a9b888137bff669 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3888: [HUDI-2624] Implement Non Index type for HUDI

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3888:
URL: https://github.com/apache/hudi/pull/3888#issuecomment-954503596


   
   ## CI report:
   
   * 0bb6cf636d6a4e9e902706a28364845a7609e38d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3086)
 
   * f20758076b7fd9355a6b3075fc03d93982b80cc9 UNKNOWN
   * 4efd2f4b2a47c7417aa6dc84ef40162637448fdf UNKNOWN
   * 49a813a067fccf3bad68648b7b0164d7d36a5947 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3092)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3888: [HUDI-2624] Implement Non Index type for HUDI

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3888:
URL: https://github.com/apache/hudi/pull/3888#issuecomment-954503596


   
   ## CI report:
   
   * 0bb6cf636d6a4e9e902706a28364845a7609e38d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3086)
 
   * f20758076b7fd9355a6b3075fc03d93982b80cc9 UNKNOWN
   * 4efd2f4b2a47c7417aa6dc84ef40162637448fdf UNKNOWN
   * 49a813a067fccf3bad68648b7b0164d7d36a5947 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-899911684


   
   ## CI report:
   
   * d2b00796c9564088aa8533431c73251993f688d4 UNKNOWN
   * 99853468aec1becd1112c0ffba6ccf5f604e713d UNKNOWN
   * 093275425688b2572da5e857899fecbc0c718cf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3057)
 
   * 210aa90b7cedc691b11d7e146a94ab199874ae50 UNKNOWN
   * 0876ffb9762eda7a914e0ac8978284726cc0b267 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

2021-11-02 Thread GitBox


prashantwason commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958638352


   But partition path for the metadata table are hardcoded. Can that be 
helpful? Removing the fields will save a lot of storage space from record level 
index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi commented on a change in pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


zhedoubushishi commented on a change in pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#discussion_r741605659



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/DynamoDBBasedLockProvider.java
##
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.transaction.lock;
+
+import com.amazonaws.client.builder.AwsClientBuilder;
+import com.amazonaws.regions.RegionUtils;
+import com.amazonaws.services.dynamodbv2.AcquireLockOptions;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClient;
+import com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClientOptions;
+import com.amazonaws.services.dynamodbv2.LockItem;
+import com.amazonaws.services.dynamodbv2.model.AttributeDefinition;
+import com.amazonaws.services.dynamodbv2.model.CreateTableRequest;
+import com.amazonaws.services.dynamodbv2.model.KeySchemaElement;
+import com.amazonaws.services.dynamodbv2.model.KeyType;
+import com.amazonaws.services.dynamodbv2.model.LockNotGrantedException;
+import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;
+import com.amazonaws.services.dynamodbv2.model.ScalarAttributeType;
+import com.amazonaws.services.dynamodbv2.util.TableUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hudi.aws.HoodieAWSCredentialsProviderFactory;
+import org.apache.hudi.common.config.LockConfiguration;
+import org.apache.hudi.common.lock.LockProvider;
+import org.apache.hudi.common.lock.LockState;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.exception.HoodieLockException;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import javax.annotation.concurrent.NotThreadSafe;
+
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_BILLING_MODE_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_PARTITION_KEY_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_READ_CAPACITY_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_REGION_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_TABLE_NAME_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.DYNAMODB_WRITE_CAPACITY_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY;
+
+/**
+ * A DynamoDB based lock. This {@link LockProvider} implementation allows to 
lock table operations
+ * using DynamoDB. Users need to have access to AWS DynamoDB to be able to use 
this lock.
+ */
+@NotThreadSafe
+public class DynamoDBBasedLockProvider implements LockProvider {
+
+  private static final Logger LOG = 
LogManager.getLogger(DynamoDBBasedLockProvider.class);
+
+  private final AmazonDynamoDBLockClient client;
+  private final long leaseDuration;
+  private final String tableName;
+  protected LockConfiguration lockConfiguration;
+  private volatile LockItem lock;
+
+  public DynamoDBBasedLockProvider(final LockConfiguration lockConfiguration, 
final Configuration conf) {
+checkRequiredProps(lockConfiguration);
+this.lockConfiguration = lockConfiguration;
+this.tableName = 
lockConfiguration.getConfig().getString(DYNAMODB_TABLE_NAME_PROP_KEY);
+this.leaseDuration = 
Long.parseLong(lockConfiguration.getConfig().getString(LOCK_ACQUIRE_WAIT_TIMEOUT_MS_PROP_KEY));
+AmazonDynamoDB dynamoDB = getDynamoClient();
+// build the dynamoDb lock client
+this.client = new AmazonDynamoDBLockClient(
+AmazonDynamoDBLockClientOptions.builder(dynamoDB, tableName)
+.withTimeUnit(TimeUnit.MILLISECONDS)
+.withLeaseDuration(leaseDuration)
+.withHeartbeatPeriod(leaseDuration / 3)
+.withCreateHeartbeatBackgroundThread(true)
+

[GitHub] [hudi] hudi-bot edited a comment on pull request #3899: [HUDI-2660] Delete the view storage properties first before creation

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3899:
URL: https://github.com/apache/hudi/pull/3899#issuecomment-956165515


   
   ## CI report:
   
   * 245ea82852227fb3bd29aa389a64ec4f291afb0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3006)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3010)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3038)
 
   * c30db533861087c73d6d71e68cc6fdc00985803b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3911: [HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro…

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3911:
URL: https://github.com/apache/hudi/pull/3911#issuecomment-958636680


   
   ## CI report:
   
   * 90b58a3afad964af9d252a3633b555a21253df7d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3091)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 61156c4e958c1b20c3479a55ef71f2e11891398a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3068)
 
   * a3677e66a1fb13c1a91d6beb977b00ddfdd6a51e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#issuecomment-899911684


   
   ## CI report:
   
   * d2b00796c9564088aa8533431c73251993f688d4 UNKNOWN
   * 99853468aec1becd1112c0ffba6ccf5f604e713d UNKNOWN
   * 093275425688b2572da5e857899fecbc0c718cf2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3057)
 
   * 210aa90b7cedc691b11d7e146a94ab199874ae50 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3911: [HUDI-2676] Hudi should synchronize owner information to hudi _rt/_ro…

2021-11-02 Thread GitBox


hudi-bot commented on pull request #3911:
URL: https://github.com/apache/hudi/pull/3911#issuecomment-958636680


   
   ## CI report:
   
   * 90b58a3afad964af9d252a3633b555a21253df7d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3899: [HUDI-2660] Delete the view storage properties first before creation

2021-11-02 Thread GitBox


hudi-bot edited a comment on pull request #3899:
URL: https://github.com/apache/hudi/pull/3899#issuecomment-956165515


   
   ## CI report:
   
   * 245ea82852227fb3bd29aa389a64ec4f291afb0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3006)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3010)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3038)
 
   * c30db533861087c73d6d71e68cc6fdc00985803b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi commented on a change in pull request #3486: [HUDI-2314] Add support for DynamoDb based lock

2021-11-02 Thread GitBox


zhedoubushishi commented on a change in pull request #3486:
URL: https://github.com/apache/hudi/pull/3486#discussion_r741604415



##
File path: hudi-client/hudi-client-common/pom.xml
##
@@ -218,6 +222,27 @@
   ${zk-curator.version}
   test
 
+
+  com.amazonaws

Review comment:
   Done. Created ```hudi-aws``` module




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   7   >