[GitHub] [hudi] wuwenchi commented on a diff in pull request #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


wuwenchi commented on code in PR #6539:
URL: https://github.com/apache/hudi/pull/6539#discussion_r958074279


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -73,21 +73,16 @@ public static String 
getPartitionPathFromGenericRecord(GenericRecord genericReco
*/
   public static String[] extractRecordKeys(String recordKey) {
 String[] fieldKV = recordKey.split(",");
-if (fieldKV.length == 1) {
-  return fieldKV;
-} else {
-  // a complex key
-  return Arrays.stream(fieldKV).map(kv -> {
-final String[] kvArray = kv.split(":");
-if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) {
-  return null;
-} else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
-  return "";
-} else {
-  return kvArray[1];
-}
-  }).toArray(String[]::new);
-}
+return Arrays.stream(fieldKV).map(kv -> {
+  final String[] kvArray = kv.split(":");
+  if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) {
+return null;
+  } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
+return "";
+  } else {
+return kvArray[1];
+  }
+}).toArray(String[]::new);

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zhangshunyu closed issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


Zhangshunyu closed issue #6528: [SUPPORT]How to clean the compacted .log and 
.hfiles in metadata?
URL: https://github.com/apache/hudi/issues/6528


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6506: Allow hoodie read client to choose index

2022-08-29 Thread GitBox


yihua commented on PR #6506:
URL: https://github.com/apache/hudi/pull/6506#issuecomment-1231181637

   @parisni Could you also add the Jira ticket number?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6506: Allow hoodie read client to choose index

2022-08-29 Thread GitBox


yihua commented on code in PR #6506:
URL: https://github.com/apache/hudi/pull/6506#discussion_r958036128


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java:
##
@@ -92,6 +92,18 @@ public HoodieReadClient(HoodieSparkEngineContext context, 
String basePath, SQLCo
 this.sqlContextOpt = Option.of(sqlContext);
   }
 
+  /**
+   * @param context

Review Comment:
   nit: add meaningful docs here?



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/HoodieReadClient.java:
##
@@ -92,6 +92,18 @@ public HoodieReadClient(HoodieSparkEngineContext context, 
String basePath, SQLCo
 this.sqlContextOpt = Option.of(sqlContext);
   }
 
+  /**
+   * @param context
+   * @param basePath
+   * @param sqlContext
+   * @param indexType
+   */
+  public HoodieReadClient(HoodieSparkEngineContext context, String basePath, 
SQLContext sqlContext, HoodieIndex.IndexType indexType) {

Review Comment:
   Is this going to be used in any query engine?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

2022-08-29 Thread GitBox


hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231174900

   
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6515: [HUDI-4730] FIX Batch job cannot clean old commits&data files in clea…

2022-08-29 Thread GitBox


yihua commented on PR #6515:
URL: https://github.com/apache/hudi/pull/6515#issuecomment-1231174530

   @danny0405 @XuQianJin-Stars is this PR good for merging?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-4718) Hudi cli does not support Kerberized Hadoop cluster

2022-08-29 Thread xi chaomin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xi chaomin reassigned HUDI-4718:


Assignee: Yao Zhang

> Hudi cli does not support Kerberized Hadoop cluster
> ---
>
> Key: HUDI-4718
> URL: https://issues.apache.org/jira/browse/HUDI-4718
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Yao Zhang
>Assignee: Yao Zhang
>Priority: Major
> Fix For: 0.13.0
>
>
> Hudi cli connect command cannot read table from Kerberized Hadoop cluster and 
> there is no way to perform Kerberos authentication. 
> I plan to add this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on pull request #6535: [HUDI-4193] change protoc version so it compiles on m1 mac

2022-08-29 Thread GitBox


yihua commented on PR #6535:
URL: https://github.com/apache/hudi/pull/6535#issuecomment-1231148093

   @xushiyan @nsivabalan There are two other PRs fixing the build issue around 
protoc: #6455 #5757.  Shall we decide the approach here and land only one of 
these?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


yihua commented on code in PR #6539:
URL: https://github.com/apache/hudi/pull/6539#discussion_r958009785


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java:
##
@@ -73,21 +73,16 @@ public static String 
getPartitionPathFromGenericRecord(GenericRecord genericReco
*/
   public static String[] extractRecordKeys(String recordKey) {
 String[] fieldKV = recordKey.split(",");
-if (fieldKV.length == 1) {
-  return fieldKV;
-} else {
-  // a complex key
-  return Arrays.stream(fieldKV).map(kv -> {
-final String[] kvArray = kv.split(":");
-if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) {
-  return null;
-} else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
-  return "";
-} else {
-  return kvArray[1];
-}
-  }).toArray(String[]::new);
-}
+return Arrays.stream(fieldKV).map(kv -> {
+  final String[] kvArray = kv.split(":");
+  if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) {
+return null;
+  } else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
+return "";
+  } else {
+return kvArray[1];
+  }
+}).toArray(String[]::new);

Review Comment:
   @wuwenchi could you add a unit test for the util method considering the 
fixed case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Update migration_guide.md (#6275)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 4fc0d427a0 [DOCS] Update migration_guide.md (#6275)
4fc0d427a0 is described below

commit 4fc0d427a00cd650057c0458e3a596dfb1d58e9d
Author: Manu <36392121+x...@users.noreply.github.com>
AuthorDate: Tue Aug 30 13:00:31 2022 +0800

[DOCS] Update migration_guide.md (#6275)

Co-authored-by: Y Ethan Guo 
---
 website/docs/migration_guide.md| 42 +-
 .../version-0.11.1/migration_guide.md  | 42 +-
 .../version-0.12.0/migration_guide.md  | 42 +-
 3 files changed, 78 insertions(+), 48 deletions(-)

diff --git a/website/docs/migration_guide.md b/website/docs/migration_guide.md
index e7dd5c29d7..449d65c376 100644
--- a/website/docs/migration_guide.md
+++ b/website/docs/migration_guide.md
@@ -36,8 +36,29 @@ Import your existing table into a Hudi managed table. Since 
all the data is Hudi
 There are a few options when choosing this approach.
 
 **Option 1**
-Use the HDFSParquetImporter tool. As the name suggests, this only works if 
your existing table is in parquet file format.
-This tool essentially starts a Spark Job to read the existing parquet table 
and converts it into a HUDI managed table by re-writing all the data.
+Use the HoodieDeltaStreamer tool. HoodieDeltaStreamer supports bootstrap with 
--run-bootstrap command line option. There are two types of bootstrap,
+METADATA_ONLY and FULL_RECORD. METADATA_ONLY will generate just skeleton base 
files with keys/footers, avoiding full cost of rewriting the dataset.
+FULL_RECORD will perform a full copy/rewrite of the data as a Hudi table.
+
+Here is an example for running FULL_RECORD bootstrap and keeping hive style 
partition with HoodieDeltaStreamer.
+```
+spark-submit --master local \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+--run-bootstrap \
+--target-base-path /tmp/hoodie/bootstrap_table \
+--target-table bootstrap_table \
+--table-type COPY_ON_WRITE \
+--hoodie-conf hoodie.bootstrap.base.path=/tmp/source_table \
+--hoodie-conf hoodie.datasource.write.recordkey.field=${KEY_FIELD} \
+--hoodie-conf hoodie.datasource.write.partitionpath.field=${PARTITION_FIELD} \
+--hoodie-conf hoodie.datasource.write.precombine.field=${PRECOMBINE_FILED} \
+--hoodie-conf 
hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.SimpleKeyGenerator \
+--hoodie-conf 
hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
 \
+--hoodie-conf 
hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector
 \
+--hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
+--hoodie-conf hoodie.datasource.write.hive_style_partitioning=true
+``` 
 
 **Option 2**
 For huge tables, this could be as simple as : 
@@ -50,21 +71,10 @@ for partition in [list of partitions in source table] {
 
 **Option 3**
 Write your own custom logic of how to load an existing table into a Hudi 
managed one. Please read about the RDD API
- [here](/docs/quick-start-guide). Using the HDFSParquetImporter Tool. Once 
hudi has been built via `mvn clean install -DskipTests`, the shell can be
+[here](/docs/quick-start-guide). Using the bootstrap run CLI. Once hudi has 
been built via `mvn clean install -DskipTests`, the shell can be
 fired by via `cd hudi-cli && ./hudi-cli.sh`.
 
 ```java
-hudi->hdfsparquetimport
---upsert false
---srcPath /user/parquet/table/basepath
---targetPath /user/hoodie/table/basepath
---tableName hoodie_table
---tableType COPY_ON_WRITE
---rowKeyField _row_key
---partitionPathField partitionStr
---parallelism 1500
---schemaFilePath /user/table/schema
---format parquet
---sparkMemory 6g
---retry 2
+hudi->bootstrap run --srcPath /tmp/source_table --targetPath 
/tmp/hoodie/bootstrap_table --tableName bootstrap_table --tableType 
COPY_ON_WRITE --rowKeyField ${KEY_FIELD} --partitionPathField 
${PARTITION_FIELD} --sparkMaster local --hoodieConfigs 
hoodie.datasource.write.hive_style_partitioning=true --selectorClass 
org.apache.hudi.client.bootstrap.selector.FullRecordBootstrapModeSelector
 ```
+Unlike deltaStream, FULL_RECORD or METADATA_ONLY is set with --selectorClass, 
see detalis with help "bootstrap run".
diff --git a/website/versioned_docs/version-0.11.1/migration_guide.md 
b/website/versioned_docs/version-0.11.1/migration_guide.md
index e7dd5c29d7..7f5ccf2d9c 100644
--- a/website/versioned_docs/version-0.11.1/migration_guide.md
+++ b/website/versioned_docs/version-0.11.1/migration_guide

[GitHub] [hudi] yihua merged pull request #6275: [DOCS] Update migration_guide.md

2022-08-29 Thread GitBox


yihua merged PR #6275:
URL: https://github.com/apache/hudi/pull/6275


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-4327] Fixing flaky deltastreamer test (testCleanerDeleteReplacedDataWithArchive) (#6533)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a3481efdf0 [HUDI-4327] Fixing flaky deltastreamer test 
(testCleanerDeleteReplacedDataWithArchive) (#6533)
a3481efdf0 is described below

commit a3481efdf076036b613a4be5de0cf0f9dba3aa96
Author: Sivabalan Narayanan 
AuthorDate: Mon Aug 29 21:59:15 2022 -0700

[HUDI-4327] Fixing flaky deltastreamer test 
(testCleanerDeleteReplacedDataWithArchive) (#6533)
---
 .../org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
index 88948b0385..69d6dd7d3b 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
@@ -902,6 +902,7 @@ public class TestHoodieDeltaStreamer extends 
HoodieDeltaStreamerTestBase {
 cfg.configs.addAll(getAsyncServicesConfigs(totalRecords, "false", "true", 
"2", "", ""));
 cfg.configs.add(String.format("%s=%s", 
HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key(), "0"));
 cfg.configs.add(String.format("%s=%s", 
HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key(), "1"));
+cfg.configs.add(String.format("%s=%s", 
HoodieWriteConfig.MARKERS_TYPE.key(), "DIRECT"));
 HoodieDeltaStreamer ds = new HoodieDeltaStreamer(cfg, jsc);
 deltaStreamerTestRunner(ds, cfg, (r) -> {
   TestHelpers.assertAtLeastNReplaceCommits(2, tableBasePath, dfs);
@@ -947,13 +948,14 @@ public class TestHoodieDeltaStreamer extends 
HoodieDeltaStreamerTestBase {
 assertFalse(replacedFilePaths.isEmpty());
 
 // Step 4 : Insert 1 record and trigger sync/async cleaner and archive.
-List configs = getAsyncServicesConfigs(1, "true", "true", "2", "", 
"");
+List configs = getAsyncServicesConfigs(1, "true", "true", "6", "", 
"");
 configs.add(String.format("%s=%s", HoodieCleanConfig.CLEANER_POLICY.key(), 
"KEEP_LATEST_COMMITS"));
 configs.add(String.format("%s=%s", 
HoodieCleanConfig.CLEANER_COMMITS_RETAINED.key(), "1"));
 configs.add(String.format("%s=%s", 
HoodieArchivalConfig.MIN_COMMITS_TO_KEEP.key(), "2"));
 configs.add(String.format("%s=%s", 
HoodieArchivalConfig.MAX_COMMITS_TO_KEEP.key(), "3"));
 configs.add(String.format("%s=%s", HoodieCleanConfig.ASYNC_CLEAN.key(), 
asyncClean));
 configs.add(String.format("%s=%s", 
HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key(), "1"));
+cfg.configs.add(String.format("%s=%s", 
HoodieWriteConfig.MARKERS_TYPE.key(), "DIRECT"));
 if (asyncClean) {
   configs.add(String.format("%s=%s", 
HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(),
   WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name()));



[GitHub] [hudi] yihua merged pull request #6533: [HUDI-4327] Fixing flaky deltastreamer test (testCleanerDeleteReplacedDataWithArchive)

2022-08-29 Thread GitBox


yihua merged PR #6533:
URL: https://github.com/apache/hudi/pull/6533


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6533: [HUDI-4327] Fixing flaky deltastreamer test (testCleanerDeleteReplacedDataWithArchive)

2022-08-29 Thread GitBox


yihua commented on code in PR #6533:
URL: https://github.com/apache/hudi/pull/6533#discussion_r958006290


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java:
##
@@ -902,6 +902,7 @@ public void 
testCleanerDeleteReplacedDataWithArchive(Boolean asyncClean) throws
 cfg.configs.addAll(getAsyncServicesConfigs(totalRecords, "false", "true", 
"2", "", ""));
 cfg.configs.add(String.format("%s=%s", 
HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key(), "0"));
 cfg.configs.add(String.format("%s=%s", 
HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key(), "1"));
+cfg.configs.add(String.format("%s=%s", 
HoodieWriteConfig.MARKERS_TYPE.key(), "DIRECT"));

Review Comment:
   Do we know why the timeline-server-based markers make the test flaky?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (71b8174058 -> 7c9ceb6370)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 71b8174058 [HUDI-4340] fix not parsable text DateTimeParseException by 
addng a method parseDateFromInstantTimeSafely for parsing timestamp when output 
metrics (#6000)
 add 7c9ceb6370 [DOCS] Add docs about 
javax.security.auth.login.LoginException when starting Hudi Sink Connector 
(#6255)

No new revisions were added by this update.

Summary of changes:
 hudi-kafka-connect/README.md | 25 +
 1 file changed, 25 insertions(+)



[GitHub] [hudi] yihua merged pull request #6255: [DOCS] Add doc about javax.security.auth.login.LoginException in Hudi KC Sink

2022-08-29 Thread GitBox


yihua merged PR #6255:
URL: https://github.com/apache/hudi/pull/6255


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-29 Thread GitBox


hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1231138044

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978)
 
   * 525791f3450141706470bb1ac39eb6b8716f3dfc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11036)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-29 Thread GitBox


hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1231135357

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978)
 
   * 525791f3450141706470bb1ac39eb6b8716f3dfc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Fix link rendering error in Docker Demo and some other typos (#6083)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new b42a13a277 [DOCS] Fix link rendering error in Docker Demo and some 
other typos (#6083)
b42a13a277 is described below

commit b42a13a2776991197f241f8792e3c1f74f05b64e
Author: totoro 
AuthorDate: Tue Aug 30 12:45:35 2022 +0800

[DOCS] Fix link rendering error in Docker Demo and some other typos (#6083)
---
 website/docs/docker_demo.md   | 2 +-
 website/docs/quick-start-guide.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/website/docs/docker_demo.md b/website/docs/docker_demo.md
index 48cee5d507..4a390506c3 100644
--- a/website/docs/docker_demo.md
+++ b/website/docs/docker_demo.md
@@ -16,7 +16,7 @@ The steps have been tested on a Mac laptop
 ### Prerequisites
 
   * Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
-  * Docker Setup :  For Mac, Please follow the steps as defined in 
[https://docs.docker.com/v17.12/docker-for-mac/install/]. For running Spark-SQL 
queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See 
Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be 
killed because of memory issues.
+  * Docker Setup :  For Mac, Please follow the steps as defined in [Install 
Docker Desktop on Mac](https://docs.docker.com/desktop/install/mac-install/). 
For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are 
allocated to Docker (See Docker -> Preferences -> Advanced). Otherwise, 
spark-SQL queries could be killed because of memory issues.
   * kcat : A command-line utility to publish/consume from kafka topics. Use 
`brew install kcat` to install kcat.
   * /etc/hosts : The demo references many services running in container by the 
hostname. Add the following settings to /etc/hosts
 
diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 145c17f843..eb1f3596d2 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -287,7 +287,7 @@ create table hudi_cow_nonpcf_tbl (
 ) using hudi;
 
 
--- create a mor non-partitioned table without preCombineField provided
+-- create a mor non-partitioned table with preCombineField provided
 create table hudi_mor_tbl (
   id int,
   name string,



[GitHub] [hudi] yihua merged pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


yihua merged PR #6083:
URL: https://github.com/apache/hudi/pull/6083


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


yihua commented on PR #6083:
URL: https://github.com/apache/hudi/pull/6083#issuecomment-1231130336

   > Hi @yihua, I add a new commit to solve the conflict (seems not work), and 
there are three commits for this PR,Do I need to squash these commits?
   
   Don't worry about it.  I'll do the squash and merge so you don't have to 
squash the commits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4483) Fix checkstyle on scala code and integ-test module

2022-08-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4483:
-
Fix Version/s: 0.12.1
   (was: 0.13.0)

> Fix checkstyle on scala code and integ-test module
> --
>
> Key: HUDI-4483
> URL: https://issues.apache.org/jira/browse/HUDI-4483
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Raymond Xu
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> checkstyle does not work on scala code
> see HUDI-4482
> and integration test module
> in GenericRecordFullPayloadGenerator.java
> import com.google.common.annotations.VisibleForTesting;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4483) Fix checkstyle on scala code and integ-test module

2022-08-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4483:


Assignee: KnightChess

> Fix checkstyle on scala code and integ-test module
> --
>
> Key: HUDI-4483
> URL: https://issues.apache.org/jira/browse/HUDI-4483
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Raymond Xu
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> checkstyle does not work on scala code
> see HUDI-4482
> and integration test module
> in GenericRecordFullPayloadGenerator.java
> import com.google.common.annotations.VisibleForTesting;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4483) Fix checkstyle on scala code and integ-test module

2022-08-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-4483.

Resolution: Fixed

> Fix checkstyle on scala code and integ-test module
> --
>
> Key: HUDI-4483
> URL: https://issues.apache.org/jira/browse/HUDI-4483
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Raymond Xu
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> checkstyle does not work on scala code
> see HUDI-4482
> and integration test module
> in GenericRecordFullPayloadGenerator.java
> import com.google.common.annotations.VisibleForTesting;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] si1verwind17 closed issue #6526: [SUPPORT] Unable to sync Hudi with hive metastore

2022-08-29 Thread GitBox


si1verwind17 closed issue #6526: [SUPPORT] Unable to sync Hudi with hive 
metastore
URL: https://github.com/apache/hudi/issues/6526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] si1verwind17 commented on issue #6526: [SUPPORT] Unable to sync Hudi with hive metastore

2022-08-29 Thread GitBox


si1verwind17 commented on issue #6526:
URL: https://github.com/apache/hudi/issues/6526#issuecomment-1231125320

   I have resolved the error. The problem wasn't from the Hudi side. 
   
   The error below
   `: org.apache.hudi.exception.HoodieException: Could not sync using the meta 
sync class org.apache.hudi.hive.HiveSyncTool`
   
   caused by the other error from Hive Metastore which doesn't recognize scheme 
gs://
   
   So, I solved by putting the gcs connector jar (shaded)  to $HIVE_HOME/lib in 
Remote Hive Metastore


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wuwenchi commented on pull request #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


wuwenchi commented on PR #6539:
URL: https://github.com/apache/hudi/pull/6539#issuecomment-1231120563

   @danny0405  Can you help review it?  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] szknb commented on issue #6530: [SUPPORT] org.apache.hudi.exception.HoodieException: Invalid partition name [2020/01/02, 2020/01/01, 2020/01/03]

2022-08-29 Thread GitBox


szknb commented on issue #6530:
URL: https://github.com/apache/hudi/issues/6530#issuecomment-1231110006

   hudi version: 0.7.0-bd33


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhoulii commented on pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


zhoulii commented on PR #6083:
URL: https://github.com/apache/hudi/pull/6083#issuecomment-1231106323

   Hi @yihua, I add a new commit to solve the conflict (seems not work), and 
there are three commits for this PR,Do I need to squash these commits?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liangyu-1 commented on issue #6529: [SUPPORT] jar conflicts about org.apache.hudi.execution.FlinkLazyInsertIterable.getTransformFunction

2022-08-29 Thread GitBox


liangyu-1 commented on issue #6529:
URL: https://github.com/apache/hudi/issues/6529#issuecomment-1231105760

   I figured out that I imported both hudi-flink-client jar and 
hudi-flink-bundle jar in my project.
   hudi-flinl-bundle have shaded org.apache.avro but hudi-flink-client didn't, 
thus there is a conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

2022-08-29 Thread GitBox


hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231104651

   
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029)
 
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6541: [HUDI-4740] Add metadata fields for hive catalog #createTable

2022-08-29 Thread GitBox


hudi-bot commented on PR #6541:
URL: https://github.com/apache/hudi/pull/6541#issuecomment-1231102308

   
   ## CI report:
   
   * 1cbef90f645fdaaa68383ac186aedefbbf7da58b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11034)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6542: [MINOR] Fix typo in HoodieArchivalConfig

2022-08-29 Thread GitBox


hudi-bot commented on PR #6542:
URL: https://github.com/apache/hudi/pull/6542#issuecomment-1231102332

   
   ## CI report:
   
   * 93f96405bd8cd6a5486eb0e08e7c08a77214d362 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11035)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6534: [HUDI-4695] Fix flaky TestInlineCompaction#testCompactionRetryOnFailureBasedOnTime

2022-08-29 Thread GitBox


hudi-bot commented on PR #6534:
URL: https://github.com/apache/hudi/pull/6534#issuecomment-1231102263

   
   ## CI report:
   
   * 1a56cdc2bc53917efb33ff786ff14775dd2b526b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11026)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #960: [WIP] [HUDI-307] Adding type test for timestamp,date & decimal

2022-08-29 Thread GitBox


yihua commented on PR #960:
URL: https://github.com/apache/hudi/pull/960#issuecomment-1231101317

   @arw357 @leesf @vinothchandar @bvaradar this PR becomes old :)  Do we still 
need this?  @nsivabalan @xushiyan has the current set of tests already covered 
different types?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6541: [HUDI-4740] Add metadata fields for hive catalog #createTable

2022-08-29 Thread GitBox


hudi-bot commented on PR #6541:
URL: https://github.com/apache/hudi/pull/6541#issuecomment-1231099684

   
   ## CI report:
   
   * 1cbef90f645fdaaa68383ac186aedefbbf7da58b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2022-08-29 Thread GitBox


yihua commented on PR #1650:
URL: https://github.com/apache/hudi/pull/1650#issuecomment-1231100189

   @pratyakshsharma do you still plan to land this PR given the code base has 
changed since April?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zhoulii commented on a diff in pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


zhoulii commented on code in PR #6083:
URL: https://github.com/apache/hudi/pull/6083#discussion_r957972065


##
website/docs/configurations.md:
##
@@ -3197,7 +3197,7 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 >  hoodie.keep.min.commits
-> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.
+> Similar to hoodie.keep.max.commits, but controls the minimum number of 
instants to retain in the active timeline.

Review Comment:
   @yihua Thanks for reviewing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6542: [MINOR] Fix typo in HoodieArchivalConfig

2022-08-29 Thread GitBox


hudi-bot commented on PR #6542:
URL: https://github.com/apache/hudi/pull/6542#issuecomment-1231099705

   
   ## CI report:
   
   * 93f96405bd8cd6a5486eb0e08e7c08a77214d362 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6534: [HUDI-4695] Fix flaky TestInlineCompaction#testCompactionRetryOnFailureBasedOnTime

2022-08-29 Thread GitBox


hudi-bot commented on PR #6534:
URL: https://github.com/apache/hudi/pull/6534#issuecomment-1231099635

   
   ## CI report:
   
   * 1a56cdc2bc53917efb33ff786ff14775dd2b526b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Clarification to Docker quickstart demo (#6302)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new df4e119bdb [DOCS] Clarification to Docker quickstart demo (#6302)
df4e119bdb is described below

commit df4e119bdbe946d32b037492d8874452e29bf829
Author: Robin Moffatt 
AuthorDate: Tue Aug 30 04:30:43 2022 +0100

[DOCS] Clarification to Docker quickstart demo (#6302)

Co-authored-by: Y Ethan Guo 
---
 website/docs/docker_demo.md | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/website/docs/docker_demo.md b/website/docs/docker_demo.md
index 7f56129a1c..48cee5d507 100644
--- a/website/docs/docker_demo.md
+++ b/website/docs/docker_demo.md
@@ -5,15 +5,17 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
-## A Demo using docker containers
+## A Demo using Docker containers
 
-Lets use a real world example to see how hudi works end to end. For this 
purpose, a self contained
-data infrastructure is brought up in a local docker cluster within your 
computer.
+Let's use a real world example to see how Hudi works end to end. For this 
purpose, a self contained
+data infrastructure is brought up in a local Docker cluster within your 
computer. It requires the
+Hudi repo to have been cloned locally. 
 
 The steps have been tested on a Mac laptop
 
 ### Prerequisites
 
+  * Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
   * Docker Setup :  For Mac, Please follow the steps as defined in 
[https://docs.docker.com/v17.12/docker-for-mac/install/]. For running Spark-SQL 
queries, please ensure atleast 6 GB and 4 CPUs are allocated to Docker (See 
Docker -> Preferences -> Advanced). Otherwise, spark-SQL queries could be 
killed because of memory issues.
   * kcat : A command-line utility to publish/consume from kafka topics. Use 
`brew install kcat` to install kcat.
   * /etc/hosts : The demo references many services running in container by the 
hostname. Add the following settings to /etc/hosts
@@ -41,7 +43,10 @@ Also, this has not been tested on some environments like 
Docker on Windows.
 
 ### Build Hudi
 
-The first step is to build hudi. **Note** This step builds hudi on default 
supported scala version - 2.11.
+The first step is to build Hudi. **Note** This step builds Hudi on default 
supported scala version - 2.11.
+
+NOTE: Make sure you've cloned the [Hudi 
repository](https://github.com/apache/hudi) first. 
+
 ```java
 cd 
 mvn clean package -Pintegration-tests -DskipTests
@@ -49,8 +54,9 @@ mvn clean package -Pintegration-tests -DskipTests
 
 ### Bringing up Demo Cluster
 
-The next step is to run the docker compose script and setup configs for 
bringing up the cluster.
-This should pull the docker images from docker hub and setup docker cluster.
+The next step is to run the Docker compose script and setup configs for 
bringing up the cluster. These files are in the [Hudi 
repository](https://github.com/apache/hudi) which you should already have 
locally on your machine from the previous steps. 
+
+This should pull the Docker images from Docker hub and setup the Docker 
cluster.
 
 ```java
 cd docker
@@ -112,7 +118,7 @@ Copying spark default config and setting up configs
 $ docker ps
 ```
 
-At this point, the docker cluster will be up and running. The demo cluster 
brings up the following services
+At this point, the Docker cluster will be up and running. The demo cluster 
brings up the following services
 
* HDFS Services (NameNode, DataNode)
* Spark Master and Worker
@@ -1317,13 +1323,13 @@ This brings the demo to an end.
 
 ## Testing Hudi in Local Docker environment
 
-You can bring up a hadoop docker environment containing Hadoop, Hive and Spark 
services with support for hudi.
+You can bring up a Hadoop Docker environment containing Hadoop, Hive and Spark 
services with support for Hudi.
 ```java
 $ mvn pre-integration-test -DskipTests
 ```
-The above command builds docker images for all the services with
+The above command builds Docker images for all the services with
 current Hudi source installed at /var/hoodie/ws and also brings up the 
services using a compose file. We
-currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.4.4) in docker 
images.
+currently use Hadoop (v2.8.4), Hive (v2.3.3) and Spark (v2.4.4) in Docker 
images.
 
 To bring down the containers
 ```java
@@ -1331,7 +1337,7 @@ $ cd hudi-integ-test
 $ mvn docker-compose:down
 ```
 
-If you want to bring up the docker containers, use
+If you want to bring up the Docker containers, use
 ```java
 $ cd hudi-integ-test
 $ mvn docker-compose:up -DdetachedMode=true
@@ -1345,21 +1351,21 @@ docker environment (See 
__hudi-integ-test/src/test/java/org/apache/hudi/integ/IT
 
 ### Building Local Docker Containers:
 
-The docker images required for demo and running 

[GitHub] [hudi] yihua merged pull request #6302: [DOCS] Clarification to Docker quickstart demo

2022-08-29 Thread GitBox


yihua merged PR #6302:
URL: https://github.com/apache/hudi/pull/6302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [HUDI-4339] Add example configuration for HoodieCleaner in docs (#6326)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 4c8f222070 [HUDI-4339] Add example configuration for HoodieCleaner in 
docs (#6326)
4c8f222070 is described below

commit 4c8f2220700e738cbff44782c1476f98f55d8f3e
Author: Manu <36392121+x...@users.noreply.github.com>
AuthorDate: Tue Aug 30 11:30:15 2022 +0800

[HUDI-4339] Add example configuration for HoodieCleaner in docs (#6326)

Co-authored-by: Y Ethan Guo 
---
 website/docs/hoodie_cleaner.md | 80 ++
 .../version-0.11.1/hoodie_cleaner.md   | 63 +++--
 .../version-0.12.0/hoodie_cleaner.md   | 62 +++--
 3 files changed, 179 insertions(+), 26 deletions(-)

diff --git a/website/docs/hoodie_cleaner.md b/website/docs/hoodie_cleaner.md
index 10f1aa2450..1687a0e065 100644
--- a/website/docs/hoodie_cleaner.md
+++ b/website/docs/hoodie_cleaner.md
@@ -14,15 +14,22 @@ each commit, to delete older file slices. It's recommended 
to leave this enabled
 When cleaning old files, you should be careful not to remove files that are 
being actively used by long running queries.
 Hudi cleaner currently supports the below cleaning policies to keep a certain 
number of commits or file versions:
 
-- **KEEP_LATEST_COMMITS**: This is the default policy. This is a temporal 
cleaning policy that ensures the effect of 
-having lookback into all the changes that happened in the last X commits. 
Suppose a writer is ingesting data 
-into a Hudi dataset every 30 minutes and the longest running query can take 5 
hours to finish, then the user should 
-retain atleast the last 10 commits. With such a configuration, we ensure that 
the oldest version of a file is kept on 
-disk for at least 5 hours, thereby preventing the longest running query from 
failing at any point in time. Incremental cleaning is also possible using this 
policy.
-- **KEEP_LATEST_FILE_VERSIONS**: This policy has the effect of keeping N 
number of file versions irrespective of time. 
-This policy is useful when it is known how many MAX versions of the file does 
one want to keep at any given time. 
-To achieve the same behaviour as before of preventing long running queries 
from failing, one should do their calculations 
-based on data patterns. Alternatively, this policy is also useful if a user 
just wants to maintain 1 latest version of the file.
+- **KEEP_LATEST_COMMITS**: This is the default policy. This is a temporal 
cleaning policy that ensures the effect of
+  having lookback into all the changes that happened in the last X commits. 
Suppose a writer is ingesting data
+  into a Hudi dataset every 30 minutes and the longest running query can take 
5 hours to finish, then the user should
+  retain atleast the last 10 commits. With such a configuration, we ensure 
that the oldest version of a file is kept on
+  disk for at least 5 hours, thereby preventing the longest running query from 
failing at any point in time. Incremental cleaning is also possible using this 
policy.
+  Number of commits to retain can be configured by 
`hoodie.cleaner.commits.retained`.
+
+- **KEEP_LATEST_FILE_VERSIONS**: This policy has the effect of keeping N 
number of file versions irrespective of time.
+  This policy is useful when it is known how many MAX versions of the file 
does one want to keep at any given time.
+  To achieve the same behaviour as before of preventing long running queries 
from failing, one should do their calculations
+  based on data patterns. Alternatively, this policy is also useful if a user 
just wants to maintain 1 latest version of the file.
+  Number of file versions to retain can be configured by 
`hoodie.cleaner.fileversions.retained`.
+
+- **KEEP_LATEST_BY_HOURS**: This policy clean up based on hours.It is simple 
and useful when knowing that you want to keep files at any given time.
+  Corresponding to commits with commit times older than the configured number 
of hours to be retained are cleaned.
+  Currently you can configure by parameter `hoodie.cleaner.hours.retained`.
 
 ### Configurations
 For details about all possible configurations and their default values see the 
[configuration 
docs](https://hudi.apache.org/docs/configurations#Compaction-Configs).
@@ -32,12 +39,52 @@ Hoodie Cleaner can be run as a separate process or along 
with your data ingestio
 ingesting data, configs are available which enable you to run it 
[synchronously or 
asynchronously](https://hudi.apache.org/docs/configurations#hoodiecleanasync).
 
 You can use this command for running the cleaner independently:
-```java
-[hoodie]$ spark-submit --class org.apache.hudi.utilities.HoodieCleaner \
-  --props s3:///temp/hudi-ingestion-config/kafka-source.properties \
-  --target-base-path s3:///temp/hudi \
-  --spark-master yarn-clus

[GitHub] [hudi] yihua merged pull request #6326: [HUDI-4339] Add example configuration for HoodieCleaner in docs

2022-08-29 Thread GitBox


yihua merged PR #6326:
URL: https://github.com/apache/hudi/pull/6326


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6534: [HUDI-4695] Fix flaky TestInlineCompaction#testCompactionRetryOnFailureBasedOnTime

2022-08-29 Thread GitBox


xushiyan commented on code in PR #6534:
URL: https://github.com/apache/hudi/pull/6534#discussion_r957970551


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/TestInlineCompaction.java:
##
@@ -294,8 +294,9 @@ public void testCompactionRetryOnFailureBasedOnTime() 
throws Exception {
   moveCompactionFromRequestedToInflight(instantTime, cfg);
 }
 
-// When: commit happens after 10s
-HoodieWriteConfig inlineCfg = getConfigForInlineCompaction(5, 10, 
CompactionTriggerStrategy.TIME_ELAPSED);
+// When: commit happens after 1000s. assumption is that, there won't be 
any new compaction getting scheduled within 100s, but the previous failed one 
will be

Review Comment:
   is this gonna add a lot to the running time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


yihua commented on code in PR #6083:
URL: https://github.com/apache/hudi/pull/6083#discussion_r957970464


##
website/docs/configurations.md:
##
@@ -3197,7 +3197,7 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 >  hoodie.keep.min.commits
-> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.
+> Similar to hoodie.keep.max.commits, but controls the minimum number of 
instants to retain in the active timeline.

Review Comment:
   I addressed it in #6542.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua opened a new pull request, #6542: [MINOR] Fix typo in HoodieArchivalConfig

2022-08-29 Thread GitBox


yihua opened a new pull request, #6542:
URL: https://github.com/apache/hudi/pull/6542

   ### Change Logs
   
   As above.
   
   ### Impact
   
   **Risk level: none**
   
   Only updates to config description.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6536: [HUDI-4736] Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed

2022-08-29 Thread GitBox


hudi-bot commented on PR #6536:
URL: https://github.com/apache/hudi/pull/6536#issuecomment-1231096991

   
   ## CI report:
   
   * dc3daf9826dea5c5b2c09dec9e2b9b0f08048c16 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11028)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6535: [HUDI-4193] change protoc version so it compiles on m1 mac

2022-08-29 Thread GitBox


hudi-bot commented on PR #6535:
URL: https://github.com/apache/hudi/pull/6535#issuecomment-1231096982

   
   ## CI report:
   
   * 4744b46d30c8b9cb57161f63996db85bd15b1dca Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11027)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-29 Thread GitBox


xushiyan commented on PR #6347:
URL: https://github.com/apache/hudi/pull/6347#issuecomment-1231096401

   @honeyaya please also simplify the pr title and add details in change logs 
section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-29 Thread GitBox


xushiyan commented on code in PR #6347:
URL: https://github.com/apache/hudi/pull/6347#discussion_r957968992


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java:
##
@@ -69,6 +70,7 @@ public static String getBucketSpec(String bucketCols, int 
bucketNum) {
 
   public HiveSyncConfig(Properties props) {
 super(props);
+validateParameters();

Review Comment:
   since validation is done in constructor. we don't need to check in 
JDBCExecutor either right?



##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java:
##
@@ -191,19 +195,27 @@ public void addPartitionsToTable(String tableName, 
List partitionsToAdd)
 }
 LOG.info("Adding partitions " + partitionsToAdd.size() + " to table " + 
tableName);
 try {
+  
ValidationUtils.checkArgument(syncConfig.getIntOrDefault(HIVE_BATCH_SYNC_PARTITION_NUM)
 > 0,
+  "batch-sync-num for sync hive table must be greater than 0, pls 
check your parameter");

Review Comment:
   then this check can be removed?



##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java:
##
@@ -191,19 +195,27 @@ public void addPartitionsToTable(String tableName, 
List partitionsToAdd)
 }
 LOG.info("Adding partitions " + partitionsToAdd.size() + " to table " + 
tableName);
 try {
+  
ValidationUtils.checkArgument(syncConfig.getIntOrDefault(HIVE_BATCH_SYNC_PARTITION_NUM)
 > 0,
+  "batch-sync-num for sync hive table must be greater than 0, pls 
check your parameter");
   StorageDescriptor sd = client.getTable(databaseName, tableName).getSd();
-  List partitionList = partitionsToAdd.stream().map(partition 
-> {
-StorageDescriptor partitionSd = new StorageDescriptor();
-partitionSd.setCols(sd.getCols());
-partitionSd.setInputFormat(sd.getInputFormat());
-partitionSd.setOutputFormat(sd.getOutputFormat());
-partitionSd.setSerdeInfo(sd.getSerdeInfo());
-String fullPartitionPath = 
FSUtils.getPartitionPath(syncConfig.getString(META_SYNC_BASE_PATH), 
partition).toString();
-List partitionValues = 
partitionValueExtractor.extractPartitionValuesInPath(partition);
-partitionSd.setLocation(fullPartitionPath);
-return new Partition(partitionValues, databaseName, tableName, 0, 0, 
partitionSd, null);
-  }).collect(Collectors.toList());
-  client.add_partitions(partitionList, true, false);
+  List partitionList = new ArrayList<>();

Review Comment:
   let's not re-use the same variable. create new var for each batch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6083: [DOCS] Fix link rendering error in Docker Demo and some other typos

2022-08-29 Thread GitBox


yihua commented on code in PR #6083:
URL: https://github.com/apache/hudi/pull/6083#discussion_r957967450


##
website/docs/configurations.md:
##
@@ -3197,7 +3197,7 @@ Configurations that control compaction (merging of log 
files onto a new base fil
 ---
 
 >  hoodie.keep.min.commits
-> Similar to hoodie.keep.max.commits, but controls the minimum number 
ofinstants to retain in the active timeline.
+> Similar to hoodie.keep.max.commits, but controls the minimum number of 
instants to retain in the active timeline.

Review Comment:
   Please refrain from changing the `configurations.md` directly.  This is 
automatically generated and updated based on the Hudi config classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-29 Thread GitBox


LinMingQiang commented on code in PR #6393:
URL: https://github.com/apache/hudi/pull/6393#discussion_r957966434


##
hudi-timeline-service/src/test/java/org/apache/hudi/timeline/service/functional/TestRemoteHoodieTableFileSystemView.java:
##
@@ -64,4 +66,31 @@ protected SyncableFileSystemView 
getFileSystemView(HoodieTimeline timeline) {
 view = new RemoteHoodieTableFileSystemView("localhost", 
server.getServerPort(), metaClient);
 return view;
   }
+
+  @Test
+  public void testRemoteHoodieTableFileSystemViewWithRetry() {
+// Service is available.
+view.getLatestBaseFiles();
+// Shut down the service.
+server.close();
+try {
+  // Immediately fails and throws a connection refused exception.
+  view.getLatestBaseFiles();
+} catch (HoodieRemoteException e) {
+  assert e.getMessage().contains("Connection refused (Connection 
refused)");
+}
+// Enable API request retry for remote file system view.
+view =  new RemoteHoodieTableFileSystemView(metaClient, 
FileSystemViewStorageConfig
+.newBuilder()
+.withRemoteServerHost("localhost")
+.withRemoteServerPort(server.getServerPort())
+.withRemoteTimelineClientRetry(true)
+.withRemoteTimelineClientMaxRetryNumbers(4)
+.build());
+try {
+  view.getLatestBaseFiles();

Review Comment:
   > is it no possible to test that retry succeed after 2 or 3 tries?
   
   I can create a Thread to restart the service.



##
hudi-timeline-service/src/test/java/org/apache/hudi/timeline/service/functional/TestRemoteHoodieTableFileSystemView.java:
##
@@ -64,4 +66,31 @@ protected SyncableFileSystemView 
getFileSystemView(HoodieTimeline timeline) {
 view = new RemoteHoodieTableFileSystemView("localhost", 
server.getServerPort(), metaClient);
 return view;
   }
+
+  @Test
+  public void testRemoteHoodieTableFileSystemViewWithRetry() {
+// Service is available.
+view.getLatestBaseFiles();
+// Shut down the service.
+server.close();
+try {
+  // Immediately fails and throws a connection refused exception.
+  view.getLatestBaseFiles();
+} catch (HoodieRemoteException e) {
+  assert e.getMessage().contains("Connection refused (Connection 
refused)");
+}
+// Enable API request retry for remote file system view.
+view =  new RemoteHoodieTableFileSystemView(metaClient, 
FileSystemViewStorageConfig
+.newBuilder()
+.withRemoteServerHost("localhost")
+.withRemoteServerPort(server.getServerPort())
+.withRemoteTimelineClientRetry(true)
+.withRemoteTimelineClientMaxRetryNumbers(4)
+.build());
+try {
+  view.getLatestBaseFiles();

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-29 Thread GitBox


LinMingQiang commented on code in PR #6393:
URL: https://github.com/apache/hudi/pull/6393#discussion_r957966434


##
hudi-timeline-service/src/test/java/org/apache/hudi/timeline/service/functional/TestRemoteHoodieTableFileSystemView.java:
##
@@ -64,4 +66,31 @@ protected SyncableFileSystemView 
getFileSystemView(HoodieTimeline timeline) {
 view = new RemoteHoodieTableFileSystemView("localhost", 
server.getServerPort(), metaClient);
 return view;
   }
+
+  @Test
+  public void testRemoteHoodieTableFileSystemViewWithRetry() {
+// Service is available.
+view.getLatestBaseFiles();
+// Shut down the service.
+server.close();
+try {
+  // Immediately fails and throws a connection refused exception.
+  view.getLatestBaseFiles();
+} catch (HoodieRemoteException e) {
+  assert e.getMessage().contains("Connection refused (Connection 
refused)");
+}
+// Enable API request retry for remote file system view.
+view =  new RemoteHoodieTableFileSystemView(metaClient, 
FileSystemViewStorageConfig
+.newBuilder()
+.withRemoteServerHost("localhost")
+.withRemoteServerPort(server.getServerPort())
+.withRemoteTimelineClientRetry(true)
+.withRemoteTimelineClientMaxRetryNumbers(4)
+.build());
+try {
+  view.getLatestBaseFiles();

Review Comment:
   > is it no possible to test that retry succeed after 2 or 3 tries?
   
   I can start a thread to restart the service.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #6496: [SUPPORT] Hudi schema evolution, Null for oldest values

2022-08-29 Thread GitBox


xiarixiaoyao commented on issue #6496:
URL: https://github.com/apache/hudi/issues/6496#issuecomment-1231091867

   @Armelabdelkbir   if you has this requirement for spark 3.1x  pls  raise a 
pr, and i will fixed it as soon as possible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #6302: [DOCS] Clarification to Docker quickstart demo

2022-08-29 Thread GitBox


yihua commented on code in PR #6302:
URL: https://github.com/apache/hudi/pull/6302#discussion_r957964460


##
website/docs/docker_demo.md:
##
@@ -112,7 +118,7 @@ Copying spark default config and setting up configs
 $ docker ps
 ```
 
-At this point, the docker cluster will be up and running. The demo cluster 
brings up the following services
+At this point, the Dockercluster will be up and running. The demo cluster 
brings up the following services

Review Comment:
   nit: `Dockercluster` -> `Docker cluster`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6302: [DOCS] Clarification to Docker quickstart demo

2022-08-29 Thread GitBox


yihua commented on PR #6302:
URL: https://github.com/apache/hudi/pull/6302#issuecomment-1231090323

   @rmoff Thanks for your first contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4740) Add metadata fields for hive catalog #createTable

2022-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4740:
-
Labels: pull-request-available  (was: )

> Add metadata fields for hive catalog #createTable
> -
>
> Key: HUDI-4740
> URL: https://issues.apache.org/jira/browse/HUDI-4740
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.12.0
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 opened a new pull request, #6541: [HUDI-4740] Add metadata fields for hive catalog #createTable

2022-08-29 Thread GitBox


danny0405 opened a new pull request, #6541:
URL: https://github.com/apache/hudi/pull/6541

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4740) Add metadata fields for hive catalog #createTable

2022-08-29 Thread Danny Chen (Jira)
Danny Chen created HUDI-4740:


 Summary: Add metadata fields for hive catalog #createTable
 Key: HUDI-4740
 URL: https://issues.apache.org/jira/browse/HUDI-4740
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.12.0
Reporter: Danny Chen
 Fix For: 0.12.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6361: [WIP][HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2022-08-29 Thread GitBox


alexeykudinkin commented on code in PR #6361:
URL: https://github.com/apache/hudi/pull/6361#discussion_r957962272


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -26,127 +26,163 @@ import org.apache.hudi.hive.HiveSyncConfigHolder
 import org.apache.hudi.sync.common.HoodieSyncConfig
 import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions, 
HoodieSparkSqlWriter, SparkAdapterSupport}
 import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.Resolver
 import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, BoundReference, Cast, EqualTo, Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, BoundReference, EqualTo, Expression, Literal, 
NamedExpression, PredicateHelper}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.hudi.HoodieSqlCommonUtils._
-import org.apache.spark.sql.hudi.HoodieSqlUtils.getMergeIntoTargetTableId
+import org.apache.spark.sql.hudi.analysis.HoodieAnalysis.failAnalysis
+import 
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.sameNamedExpr
 import org.apache.spark.sql.hudi.command.payload.ExpressionPayload
 import org.apache.spark.sql.hudi.command.payload.ExpressionPayload._
 import org.apache.spark.sql.hudi.{ProvidesHoodieConfig, SerDeUtils}
 import org.apache.spark.sql.types.{BooleanType, StructType}
 
 import java.util.Base64
 
-
 /**
- * The Command for hoodie MergeIntoTable.
- * The match on condition must contain the row key fields currently, so that 
we can use Hoodie
- * Index to speed up the performance.
+ * Hudi's implementation of the {@code MERGE INTO} (MIT) Spark SQL statement.
+ *
+ * NOTE: That this implementation is restricted in a some aspects to 
accommodate for Hudi's crucial
+ *   constraint (of requiring every record to bear unique primary-key): 
merging condition ([[mergeCondition]])
+ *   is currently can only (and must) reference target table's primary-key 
columns (this is necessary to
+ *   leverage Hudi's upserting capabilities including Indexes)
+ *
+ * Following algorithm is applied:
  *
- * The main algorithm:
+ * 
+ *   Incoming batch ([[sourceTable]]) is reshaped such that it bears 
correspondingly:
+ *   a) (required) "primary-key" column as well as b) (optional) "pre-combine" 
column; this is
+ *   required since MIT statements does not restrict [[sourceTable]]s schema 
to be aligned w/ the
+ *   [[targetTable]]s one, while Hudi's upserting flow expects such columns to 
be present
  *
- * We pushed down all the matched and not matched (condition, assignment) 
expression pairs to the
- * ExpressionPayload. And the matched (condition, assignment) expression pairs 
will execute in the
- * ExpressionPayload#combineAndGetUpdateValue to compute the result record, 
while the not matched
- * expression pairs will execute in the ExpressionPayload#getInsertValue.
+ *   After reshaping we're writing [[sourceTable]] as a normal batch using 
Hudi's upserting
+ *   sequence, where special [[ExpressionPayload]] implementation of the 
[[HoodieRecordPayload]]
+ *   is used allowing us to execute updating, deleting and inserting clauses 
like following:
  *
- * For Mor table, it is a litter complex than this. The matched record also 
goes through the getInsertValue
- * and write append to the log. So the update actions & insert actions should 
process by the same
- * way. We pushed all the update actions & insert actions together to the
- * ExpressionPayload#getInsertValue.
+ * 
+ *   All the matched {@code WHEN MATCHED AND ... THEN (DELETE|UPDATE 
...)} conditional clauses
+ *   will produce [[(condition, expression)]] tuples that will be executed 
w/in the
+ *   [[ExpressionPayload#combineAndGetUpdateValue]] against existing (from 
[[targetTable]]) and
+ *   incoming (from [[sourceTable]]) records producing the updated 
one;
  *
+ *   Not matched {@code WHEN NOT MATCHED AND ... THEN INSERT ...} 
conditional clauses
+ *   will produce [[(condition, expression)]] tuples that will be executed 
w/in [[ExpressionPayload#getInsertValue]]
+ *   against incoming records producing ones to be inserted into target 
table;
+ * 
+ * 
+ *
+ * TODO explain workflow for MOR tables
  */
 case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends 
HoodieLeafRunnableCommand

Review Comment:
   Deleting custom Spark rules uncovered quite a few issues in this 
implementation, unfortunately had to essentially re-write it to address these



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.a

[hudi] branch asf-site updated (d98c2e1949 -> fb9b036bc6)

2022-08-29 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


from d98c2e1949 [DOCS] Fix typo in compaction.md (#6492)
 add fb9b036bc6 GitHub Actions build asf-site

No new revisions were added by this update.

Summary of changes:
 content/404.html |  8 
 content/404/index.html   |  8 
 content/assets/js/05957343.2f74fb4c.js   |  1 -
 content/assets/js/05957343.bcd954e0.js   |  1 +
 .../assets/js/{10b6d210.6f6cde11.js => 10b6d210.75d4a296.js} |  2 +-
 content/assets/js/3533dbd1.06c06f54.js   |  1 +
 content/assets/js/3533dbd1.5d7383c4.js   |  1 -
 content/assets/js/44e51e65.64758d42.js   |  1 +
 content/assets/js/44e51e65.976444d3.js   |  1 -
 content/assets/js/81d19844.3c1f9c47.js   |  1 -
 content/assets/js/81d19844.766d0401.js   |  1 +
 content/assets/js/85c8b6c7.40f22d8a.js   |  1 +
 content/assets/js/85c8b6c7.fdfdc22c.js   |  1 -
 content/assets/js/ad132b09.68c24193.js   |  1 -
 content/assets/js/ad132b09.d3b8d8db.js   |  1 +
 .../assets/js/{e2d9a3af.082ecc45.js => e2d9a3af.6b144892.js} |  2 +-
 content/assets/js/{main.7651a0bd.js => main.ffe146b0.js} |  4 ++--
 7651a0bd.js.LICENSE.txt => main.ffe146b0.js.LICENSE.txt} |  0
 .../{runtime~main.a89c7360.js => runtime~main.51ac0c85.js}   |  2 +-
 .../The-Case-for-incremental-processing-on-Hadoop/index.html |  8 
 content/blog/2016/12/30/strata-talk-2017/index.html  |  8 
 .../index.html   |  8 
 content/blog/2019/01/18/asf-incubation/index.html|  8 
 content/blog/2019/03/07/batch-vs-incremental/index.html  |  8 
 .../blog/2019/05/14/registering-dataset-to-hive/index.html   |  8 
 .../blog/2019/09/09/ingesting-database-changes/index.html|  8 
 content/blog/2019/10/22/Hudi-On-Hops/index.html  |  8 
 .../index.html   |  8 
 content/blog/2020/01/15/delete-support-in-hudi/index.html|  8 
 content/blog/2020/01/20/change-capture-using-aws/index.html  |  8 
 content/blog/2020/03/22/exporting-hudi-datasets/index.html   |  8 
 .../blog/2020/04/27/apache-hudi-apache-zepplin/index.html|  8 
 .../05/28/monitoring-hudi-metrics-with-datadog/index.html|  8 
 .../index.html   |  8 
 .../index.html   |  8 
 .../16/Apache-Hudi-grows-cloud-data-lake-maturity/index.html |  8 
 content/blog/2020/08/04/PrestoDB-and-Apache-Hudi/index.html  |  8 
 .../18/hudi-incremental-processing-on-data-lakes/index.html  |  8 
 .../efficient-migration-of-large-parquet-tables/index.html   |  8 
 .../2020/08/21/async-compaction-deployment-model/index.html  |  8 
 .../2020/08/22/ingest-multiple-tables-using-hudi/index.html  |  8 
 .../2020/10/06/cdc-solution-using-hudi-by-nclouds/index.html |  8 
 .../2020/10/15/apache-hudi-meets-apache-flink/index.html |  8 
 .../2020/10/19/Origins-of-Data-Lake-at-Grofers/index.html|  8 
 .../2020/10/19/hudi-meets-aws-emr-and-aws-dms/index.html |  8 
 .../index.html   |  8 
 .../index.html   |  8 
 content/blog/2020/11/11/hudi-indexing-mechanisms/index.html  |  8 
 .../11/29/Can-Big-Data-Solutions-Be-Affordable/index.html|  8 
 .../index.html   |  8 
 content/blog/2021/01/27/hudi-clustering-intro/index.html |  8 
 content/blog/2021/02/13/hudi-key-generators/index.html   |  8 
 .../index.html   |  8 
 .../index.html   |  8 
 content/blog/2021/03/01/hudi-file-sizing/index.html  |  8 
 .../index.html   |  8 
 .../New-features-from-Apache-hudi-in-Amazon-EMR/index.html   |  8 
 .../index.html   |  8 
 .../blog/2021/05/12/Experts-primer-on-Apache-Hudi/index.html |  8 
 .../index.html   |  8 
 .../index.html   |  8 
 .../16/Amazon-Athena-expands-Apache-Hudi-support/index.html  |  8 
 .../index.html  

[GitHub] [hudi] yihua merged pull request #6492: [DOCS] Fix typo in compaction.md

2022-08-29 Thread GitBox


yihua merged PR #6492:
URL: https://github.com/apache/hudi/pull/6492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Fix typo in compaction.md (#6492)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d98c2e1949 [DOCS] Fix typo in compaction.md (#6492)
d98c2e1949 is described below

commit d98c2e19493e8b26f082e791b8ca7b88ca38e397
Author: Terry Wang 
AuthorDate: Tue Aug 30 10:43:27 2022 +0800

[DOCS] Fix typo in compaction.md (#6492)
---
 website/docs/compaction.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/website/docs/compaction.md b/website/docs/compaction.md
index 9d73e31bd5..a6249b7ae7 100644
--- a/website/docs/compaction.md
+++ b/website/docs/compaction.md
@@ -132,9 +132,9 @@ Offline compaction needs to submit the Flink task on the 
command line. The progr
 
 |  Option Name  | Required | Default | Remarks |
 |  ---  | ---  | --- | --- |
-| `--path` | `frue` | `--` | The path where the target table is stored on Hudi 
|
+| `--path` | `true` | `--` | The path where the target table is stored on Hudi 
|
 | `--compaction-max-memory` | `false` | `100` | The index map size of log data 
during compaction, 100 MB by default. If you have enough memory, you can turn 
up this parameter |
 | `--schedule` | `false` | `false` | whether to execute the operation of 
scheduling compaction plan. When the write process is still writing, turning on 
this parameter have a risk of losing data. Therefore, it must be ensured that 
there are no write tasks currently writing data to this table when this 
parameter is turned on |
 | `--seq` | `false` | `LIFO` | The order in which compaction tasks are 
executed. Executing from the latest compaction plan by default. `LIFO`: 
executing from the latest plan. `FIFO`: executing from the oldest plan. |
 | `--service` | `false` | `false` | Whether to start a monitoring service that 
checks and schedules new compaction task in configured interval. |
-| `--min-compaction-interval-seconds` | `false` | `600(s)` | The checking 
interval for service mode, by default 10 minutes. |
\ No newline at end of file
+| `--min-compaction-interval-seconds` | `false` | `600(s)` | The checking 
interval for service mode, by default 10 minutes. |



[GitHub] [hudi] yihua commented on pull request #6492: [DOCS] Fix typo in compaction.md

2022-08-29 Thread GitBox


yihua commented on PR #6492:
URL: https://github.com/apache/hudi/pull/6492#issuecomment-1231076201

   @zjuwangg Thanks for your first contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS][MINOR] Improve spark quick start doc (#6538)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d9e0a47cb8 [DOCS][MINOR] Improve spark quick start doc (#6538)
d9e0a47cb8 is described below

commit d9e0a47cb88649dfdac2a13250737a590e50e5eb
Author: KnightChess <981159...@qq.com>
AuthorDate: Tue Aug 30 10:38:45 2022 +0800

[DOCS][MINOR] Improve spark quick start doc (#6538)
---
 website/docs/quick-start-guide.md| 12 
 .../versioned_docs/version-0.10.0/quick-start-guide.md   | 12 +---
 .../versioned_docs/version-0.10.1/quick-start-guide.md   | 16 
 .../versioned_docs/version-0.11.0/quick-start-guide.md   | 12 
 .../versioned_docs/version-0.11.1/quick-start-guide.md   | 12 
 .../versioned_docs/version-0.12.0/quick-start-guide.md   | 12 
 6 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 02ebfc74e0..145c17f843 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -67,13 +67,15 @@ spark-shell \
 # Spark 3.1
 spark-shell \
   --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.12.0 \
-  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 ```shell
 # Spark 2.4
 spark-shell \
   --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.12.0 \
-  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 
 
@@ -104,14 +106,16 @@ pyspark \
 export PYSPARK_PYTHON=$(which python3)
 pyspark \
 --packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.12.0 \
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 ```shell
 # Spark 2.4
 export PYSPARK_PYTHON=$(which python3)
 pyspark \
 --packages org.apache.hudi:hudi-spark2.4-bundle_2.11:0.12.0 \
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 
 
diff --git a/website/versioned_docs/version-0.10.0/quick-start-guide.md 
b/website/versioned_docs/version-0.10.0/quick-start-guide.md
index e3f38448e9..108b1071cd 100644
--- a/website/versioned_docs/version-0.10.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.10.0/quick-start-guide.md
@@ -41,17 +41,20 @@ From the extracted directory run spark-shell with Hudi as:
 // spark-shell for spark 3
 spark-shell \
   --packages 
org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0,org.apache.spark:spark-avro_2.12:3.1.2
 \
-  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
 // spark-shell for spark 2 with scala 2.12
 spark-shell \
   --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.10.0,org.apache.spark:spark-avro_2.12:2.4.4
 \
-  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
   
 // spark-shell for spark 2 with scala 2.11
 spark-shell \
   --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.10.0,org.apache.spark:spark-avro_2.11:2.4.4
 \
-  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+  --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 ```
 
 
@@ -91,16 +94,19 @@ export PYSPARK_PYTHON=$(which python3)
 pyspark
 --packages 
org.apache.hudi:hudi-spark3-bundle_2.12:0.10.0,org.apache.spark:spark-avro_2.12:3.1.2
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 
 # for spark2 with scala 2.12
 pyspark
 --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.10.0,org.apache.spark:spark-avro_2.12:2.4.4
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
 
 # for spark2 with scala 2.11
 pyspark
 --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.10

[GitHub] [hudi] yihua merged pull request #6538: [DOCS][MINOR] Improve spark quick start doc

2022-08-29 Thread GitBox


yihua merged PR #6538:
URL: https://github.com/apache/hudi/pull/6538


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS] Update Hudi support versions for Redshift Spectrum in the current doc. (#6521)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new d9460c5621 [DOCS] Update Hudi support versions for Redshift Spectrum 
in the current doc. (#6521)
d9460c5621 is described below

commit d9460c5621d3dd603ed074f428e7231158a6fb6c
Author: pomaster 
AuthorDate: Mon Aug 29 22:33:26 2022 -0400

[DOCS] Update Hudi support versions for Redshift Spectrum in the current 
doc. (#6521)

Co-authored-by: “pomaster” <“phong”_...@yahoo.com”>
---
 website/docs/query_engine_setup.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/website/docs/query_engine_setup.md 
b/website/docs/query_engine_setup.md
index 619ec17ca1..63978797a6 100644
--- a/website/docs/query_engine_setup.md
+++ b/website/docs/query_engine_setup.md
@@ -92,7 +92,7 @@ to `org.apache.hadoop.hive.ql.io.HiveInputFormat`. Then 
proceed to query the tab
 
 
 ## Redshift Spectrum
-Copy on Write Tables in Apache Hudi versions 0.5.2, 0.6.0, 0.7.0, 0.8.0, 
0.9.0, and 0.10.0 can be queried via Amazon Redshift Spectrum external tables.
+Copy on Write Tables in Apache Hudi versions 0.5.2, 0.6.0, 0.7.0, 0.8.0, 
0.9.0, 0.10.x, 0.11.x and 0.12.0 can be queried via Amazon Redshift Spectrum 
external tables.
 :::note
 Hudi tables are supported only when AWS Glue Data Catalog is used. It's not 
supported when you use an Apache Hive metastore as the external catalog.
 :::



[GitHub] [hudi] yihua merged pull request #6521: [DOCS] Update Hudi support versions for Redshift Spectrum in the current doc.

2022-08-29 Thread GitBox


yihua merged PR #6521:
URL: https://github.com/apache/hudi/pull/6521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6521: [DOCS] Update Hudi support versions for Redshift Spectrum in the current doc.

2022-08-29 Thread GitBox


yihua commented on PR #6521:
URL: https://github.com/apache/hudi/pull/6521#issuecomment-1231071099

   @pomaster Thanks for your first contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6514: [SUPPORT] Creating Hudi table with SparkSQL fails with FileNotFoundException

2022-08-29 Thread GitBox


yihua commented on issue #6514:
URL: https://github.com/apache/hudi/issues/6514#issuecomment-1231070318

   @functicons  does adding 
`spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension` 
work for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-4340) DeltaStreamer bootstrap failed when metrics on caused by DateTimeParseException: Text '00000000000001999' could not be parsed

2022-08-29 Thread Teng Huo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597475#comment-17597475
 ] 

Teng Huo commented on HUDI-4340:


PR https://github.com/apache/hudi/pull/6000 merged

> DeltaStreamer bootstrap failed when metrics on caused by 
> DateTimeParseException: Text '01999' could not be parsed
> -
>
> Key: HUDI-4340
> URL: https://issues.apache.org/jira/browse/HUDI-4340
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer, metrics
>Reporter: Teng Huo
>Assignee: Teng Huo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
> Attachments: error-deltastreamer.log
>
>
> Found this bug in Hudi integrate test ITTestHoodieDemo.java
> HoodieTimeline.METADATA_BOOTSTRAP_INSTANT_TS is a invalid value, 
> "01", which can not be parsed by DateTimeFormatter with format 
> SECS_INSTANT_TIMESTAMP_FORMAT = "MMddHHmmss" in method 
> HoodieInstantTimeGenerator.parseDateFromInstantTime
> Error code at 
> org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator.parseDateFromInstantTime(HoodieInstantTimeGenerator.java:96)
> https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java#L100



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


hudi-bot commented on PR #6539:
URL: https://github.com/apache/hudi/pull/6539#issuecomment-1231069104

   
   ## CI report:
   
   * 822e071498bbe67aaaced421c27cdffb8e9e6584 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11032)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-29 Thread GitBox


nsivabalan merged PR #6000:
URL: https://github.com/apache/hudi/pull/6000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] szknb commented on issue #6530: [SUPPORT] org.apache.hudi.exception.HoodieException: Invalid partition name [2020/01/02, 2020/01/01, 2020/01/03]

2022-08-29 Thread GitBox


szknb commented on issue #6530:
URL: https://github.com/apache/hudi/issues/6530#issuecomment-1231068629

   @nsivabalan 
   `public class HudiExample {
   
   private static final Logger LOG = 
LogManager.getLogger(HudiExample.class);
   
   private static String tableType = HoodieTableType.COPY_ON_WRITE.name();
   
   public static void main(String[] args) throws Exception {
   
   String tablePath = "hdfs://haruna/home/xxx/xxx/hudi";
   String tableName = "hudi-test";
   SparkConf sparkConf = 
HoodieExampleSparkUtils.defaultSparkConf("hoodie-client-example");
   
   try (JavaSparkContext jsc = new JavaSparkContext(sparkConf)) {
   
   // Generator of some records to be loaded in.
   HoodieExampleDataGenerator dataGen = new 
HoodieExampleDataGenerator<>();
   
   // initialize the table, if not done already
   Path path = new Path(tablePath);
   FileSystem fs = FSUtils.getFs(tablePath, 
jsc.hadoopConfiguration());
   if (!fs.exists(path)) {
   
HoodieTableMetaClient.initTableType(jsc.hadoopConfiguration(), tablePath,
   new HoodieTableConfig.Builder()
   
.withTableType(HoodieTableType.valueOf(tableType))
   .withTableName(tableName)
   
.withPayloadClassName(HoodieTableType.valueOf(tableType), 
HoodieAvroPayload.class.getName()).build());
   }
   
   // Create the write client to write some records in
   HoodieWriteConfig cfg = HoodieWriteConfig
   .newBuilder()
   .withPath(tablePath)
   
.withSchema(HoodieExampleDataGenerator.TRIP_EXAMPLE_SCHEMA)
   .withParallelism(2, 2)
   .withDeleteParallelism(2)
   .forTable(tableName)
   
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
   
.withCompactionConfig(HoodieCompactionConfig.newBuilder().archiveCommitsWith(20,
 30).build()).build();
   SparkRDDWriteClient client = new 
SparkRDDWriteClient<>(new HoodieSparkEngineContext(jsc), cfg);
   
   // inserts
   String newCommitTime = client.startCommit();
   LOG.info("Starting commit " + newCommitTime);
   
   List> records = 
dataGen.generateInserts(newCommitTime, 10);
   List> recordsSoFar = new 
ArrayList<>(records);
   JavaRDD> writeRecords = 
jsc.parallelize(records, 1);
   client.upsert(writeRecords, newCommitTime);
   
   LOG.info("insert finished");
   
   
   }
   }
   
   }`
   
   the HoodieExampleDataGenerator is: 
org.apache.hudi.examples.common.HoodieExampleDataGenerator;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (ac9ce85334 -> 71b8174058)

2022-08-29 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from ac9ce85334 [HUDI-4483] Fix checkstyle in integ-test module (#6523)
 add 71b8174058 [HUDI-4340] fix not parsable text DateTimeParseException by 
addng a method parseDateFromInstantTimeSafely for parsing timestamp when output 
metrics (#6000)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/client/BaseHoodieWriteClient.java  | 20 +--
 .../apache/hudi/client/SparkRDDWriteClient.java| 22 
 .../table/timeline/HoodieActiveTimeline.java   | 41 ++
 .../table/timeline/HoodieInstantTimeGenerator.java |  7 +---
 .../table/timeline/TestHoodieActiveTimeline.java   | 23 ++--
 5 files changed, 76 insertions(+), 37 deletions(-)



[GitHub] [hudi] nsivabalan commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-29 Thread GitBox


nsivabalan commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1231068126

   Latest CI run succeeded: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11009&view=results
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


hudi-bot commented on PR #6539:
URL: https://github.com/apache/hudi/pull/6539#issuecomment-1231066421

   
   ## CI report:
   
   * 822e071498bbe67aaaced421c27cdffb8e9e6584 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6537: Avoid update metastore schema if only missing column in input

2022-08-29 Thread GitBox


hudi-bot commented on PR #6537:
URL: https://github.com/apache/hudi/pull/6537#issuecomment-1231066402

   
   ## CI report:
   
   * 9e63b76454a06d57a141ad4b844752abb346d3fa UNKNOWN
   * a245595d0c988610d845f6918fe8c5ea76383e92 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11029)
 
   * 00b9224ec8c49e83ca51d52351c782083a4fba84 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11030)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (c50b6346b5 -> ac9ce85334)

2022-08-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from c50b6346b5 [HUDI-4482] remove guava and use caffeine instead for cache 
(#6240)
 add ac9ce85334 [HUDI-4483] Fix checkstyle in integ-test module (#6523)

No new revisions were added by this update.

Summary of changes:
 hudi-integ-test/pom.xml|  1 -
 .../testsuite/HoodieContinousTestSuiteWriter.java  |  2 --
 .../testsuite/HoodieInlineTestSuiteWriter.java |  8 ---
 .../testsuite/HoodieMultiWriterTestSuiteJob.java   |  3 +--
 .../integ/testsuite/HoodieTestSuiteWriter.java |  4 ++--
 .../SparkDataSourceContinuousIngestTool.java   |  1 -
 .../testsuite/configuration/DFSDeltaConfig.java|  2 +-
 .../apache/hudi/integ/testsuite/dag/DagUtils.java  | 28 --
 .../integ/testsuite/dag/nodes/BaseQueryNode.java   |  3 +--
 .../dag/nodes/BaseValidateDatasetNode.java | 24 +--
 .../integ/testsuite/dag/nodes/HiveQueryNode.java   |  3 +--
 .../integ/testsuite/dag/nodes/HiveSyncNode.java|  1 -
 .../integ/testsuite/dag/nodes/PrestoQueryNode.java |  3 +--
 .../integ/testsuite/dag/nodes/TrinoQueryNode.java  |  5 ++--
 .../dag/nodes/ValidateAsyncOperations.java | 11 ++---
 .../testsuite/dag/scheduler/DagScheduler.java  |  1 -
 .../GenericRecordFullPayloadGenerator.java |  6 ++---
 .../testsuite/reader/DFSAvroDeltaInputReader.java  |  8 ---
 18 files changed, 41 insertions(+), 73 deletions(-)



[GitHub] [hudi] yihua merged pull request #6523: [HUDI-4483] fix checkstyle in integ-test module

2022-08-29 Thread GitBox


yihua merged PR #6523:
URL: https://github.com/apache/hudi/pull/6523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #6523: [HUDI-4483] fix checkstyle in integ-test module

2022-08-29 Thread GitBox


yihua commented on PR #6523:
URL: https://github.com/apache/hudi/pull/6523#issuecomment-1231066073

   CI is green.
   https://user-images.githubusercontent.com/2497195/187334475-1c57c3a4-a0ae-4e37-986c-7fba2a8e03a2.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6000: [HUDI-4340] fix not parsable text DateTimeParseException in HoodieInstantTimeGenerator.parseDateFromInstantTime

2022-08-29 Thread GitBox


hudi-bot commented on PR #6000:
URL: https://github.com/apache/hudi/pull/6000#issuecomment-1231065908

   
   ## CI report:
   
   * 6f8e83a20276203550589848ef38953ae3edd5f5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11009)
 
   * dab63726e5470be1315bb0194720def2a61ecc14 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-4730) FIX Batch job cannot clean old commits&data files in clean Function

2022-08-29 Thread Jian Feng (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Feng reassigned HUDI-4730:
---

Assignee: Jian Feng

> FIX Batch job cannot clean old commits&data files in clean Function
> ---
>
> Key: HUDI-4730
> URL: https://issues.apache.org/jira/browse/HUDI-4730
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>  Labels: pull-request-available
>
> FIX Batch job cannot clean old commits&data files in clean Function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on issue #6540: [SUPPORT]KryoException when bulk insert into hudi with flink

2022-08-29 Thread GitBox


yihua commented on issue #6540:
URL: https://github.com/apache/hudi/issues/6540#issuecomment-1231065519

   @danny0405 another KryoException issue in Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zhangshunyu commented on issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


Zhangshunyu commented on issue #6528:
URL: https://github.com/apache/hudi/issues/6528#issuecomment-1231065241

   @yihua Ok, I see, thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6534: [HUDI-4695] Fixing flaky TestInlineCompaction#testCompactionRetryOnFailureBasedOnTime

2022-08-29 Thread GitBox


hudi-bot commented on PR #6534:
URL: https://github.com/apache/hudi/pull/6534#issuecomment-1231063650

   
   ## CI report:
   
   * 1a56cdc2bc53917efb33ff786ff14775dd2b526b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11026)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-4737) Fix flaky: TestHoodieSparkMergeOnReadTableRollback.testRollbackWithDeltaAndCompactionCommit

2022-08-29 Thread xi chaomin (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597468#comment-17597468
 ] 

xi chaomin commented on HUDI-4737:
--

Hi [~shivnarayan] , this test may have been fixed by 
[#5874.|https://github.com/apache/hudi/pull/5874,] From the point of logging 
time and line number, this branch is not the latest master, shall we merge the 
master and re-run the test?

 

 

> Fix flaky: 
> TestHoodieSparkMergeOnReadTableRollback.testRollbackWithDeltaAndCompactionCommit
> ---
>
> Key: HUDI-4737
> URL: https://issues.apache.org/jira/browse/HUDI-4737
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Priority: Major
>
> [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/9088/logs/21]
>  
> {code:java}
> 2022-06-06T07:55:56.8610256Z [ERROR] Tests run: 298, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 4,569.528 s <<< FAILURE! - in JUnit Vintage
> 2022-06-06T07:55:56.8611489Z [ERROR] boolean).[1] 
> true(testRollbackWithDeltaAndCompactionCommit  Time elapsed: 55.377 s  <<< 
> FAILURE!
> 2022-06-06T07:55:56.8612231Z org.opentest4j.AssertionFailedError: expected: 
> <0> but was: <1>
> 2022-06-06T07:55:56.8612919Z  at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
> 2022-06-06T07:55:56.8613677Z  at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
> 2022-06-06T07:55:56.8614730Z  at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
> 2022-06-06T07:55:56.8615742Z  at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
> 2022-06-06T07:55:56.8616614Z  at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:611)
> 2022-06-06T07:55:56.8617839Z  at 
> org.apache.hudi.table.functional.TestHoodieSparkMergeOnReadTableRollback.testRollbackWithDeltaAndCompactionCommit(TestHoodieSparkMergeOnReadTableRollback.java:268)
> 2022-06-06T07:55:56.8619135Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-06-06T07:55:56.8620057Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-06-06T07:55:56.8621014Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-06-06T07:55:56.8621778Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-06-06T07:55:56.8622518Z  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
> 2022-06-06T07:55:56.8623350Z  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-06-06T07:55:56.8624441Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-06-06T07:55:56.8625493Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-06-06T07:55:56.8626499Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-06-06T07:55:56.8642788Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
> 2022-06-06T07:55:56.8644032Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-06-06T07:55:56.8645036Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-06-06T07:55:56.8646046Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-06-06T07:55:56.8648269Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-06-06T07:55:56.8649118Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-06-06T07:55:56.8650108Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-06-06T07:55:56.8651091Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-06-06T07:55:56.8651889Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-06-06T07:55:56.8652809Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
> 2022-06-06T07:55:56.8653936Z  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-06-06T07:55:56.8654845Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor

[GitHub] [hudi] yihua commented on issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


yihua commented on issue #6528:
URL: https://github.com/apache/hudi/issues/6528#issuecomment-1231062883

   The reason you don't see any instantTime + '002' in the timeline is that the 
clean action does not happen in the metadata table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


yihua commented on issue #6528:
URL: https://github.com/apache/hudi/issues/6528#issuecomment-1231062546

   > Hi @yihua, thanks for your reply, i will try once. BTW, what's the meaning 
of '002' here in writeClient.clean(instantTime + "002"); i didnt find any 
instantTime + '002' in timeline
   
   `002` is the suffix for the clean instant timestamp.  The metadata table 
writer takes the same timestamp from the deltacommit and adds the suffix of 
`001` for compaction and `002` for clean to differentiate from the 
corresponding deltacommit.  This is for easy debugging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hbgstc123 opened a new issue, #6540: [SUPPORT]KryoException when bulk insert into hudi with flink

2022-08-29 Thread GitBox


hbgstc123 opened a new issue, #6540:
URL: https://github.com/apache/hudi/issues/6540

   When bulk insert into hudi with flink, flink job fail with Exception
   com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
   
   -- hudi table DDL
   CREATE TEMPORARY TABLE table_one
   (
   imp_date string,
   id bigint,
   name string,
   ts timestamp(3)
   ) PARTITIONED BY (imp_date)
   WITH
   (
   'connector' = 'hudi',
   'path' = ${hdfs_path},
   'write.operation' = 'bulk_insert',
   'table.type' = 'MERGE_ON_READ',
   'hoodie.table.keygenerator.class' = 
'org.apache.hudi.keygen.SimpleKeyGenerator',
   'hoodie.datasource.write.recordkey.field' = 'id',
   'write.precombine.field' = 'ts',
   'hive_sync.enable' = 'true',
   'hive_sync.mode' = 'hms',
   'hive_sync.metastore.uris' = 'thrift://...',
   'hive_sync.db' = 'hive_db',
   'hive_sync.table' = 'table_one',
   'hive_sync.partition_fields' = 'imp_date',
   'hive_sync.partition_extractor_class' = 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   'hoodie.datasource.write.hive_style_partitioning' = 'true',
   'hoodie.metadata.enable'='true'
   );
   
   -- insert SQL
   insert into table_one
   select  
   DATE_FORMAT(ts, 'MMdd') || cast(hour(ts) as string) as dt
   ,id
   ,`name`
   ,ts
   from source_table;
   
   
   **Environment Description**
   
   * Hudi version : 0.11 & 0.12
   
   * Flink version : 1.13
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   
   **Stacktrace**
   
   com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
   Serialization trace:
   cleaner (org.apache.flink.core.memory.MemorySegment)
   segments (org.apache.flink.table.data.binary.BinaryRowData)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:82)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
   at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:577)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:320)
   at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:289)
   at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:577)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:68)
   at 
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
   at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:505)
   at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:266)
   at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:69)
   at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
   at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
   at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
   at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
   at 
org.apache.flink.table.runtime.util.StreamRecordCollector.collect(StreamRecordCollector.java:44)
   at 
org.apache.hudi.sink.bulk.sort.SortOperator.endInput(SortOperator.java:113)
   at 
org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:91)
   at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.endInput(OperatorChain.java:441)
   at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:427)
   at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:688)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:643)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:654)
   at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:627)
   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:782)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NullPointerException
   at 
com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:80)
   at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:488)
   at 
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:57)
   ... 28 more
   
   


-- 
This is an automated message

[jira] [Updated] (HUDI-4739) Wrong value returned when length equals 1

2022-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4739:
-
Labels: pull-request-available  (was: )

> Wrong value returned when length equals 1
> -
>
> Key: HUDI-4739
> URL: https://issues.apache.org/jira/browse/HUDI-4739
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: wuwenchi
>Priority: Major
>  Labels: pull-request-available
>
> In "KeyGenUtils#extractRecordKeys" function, it will return the value 
> corresponding to the key, but when the length is equal to 1, the key and 
> value are returned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] wuwenchi opened a new pull request, #6539: [HUDI-4739] Wrong value returned when key's length equals 1

2022-08-29 Thread GitBox


wuwenchi opened a new pull request, #6539:
URL: https://github.com/apache/hudi/pull/6539

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] microbearz commented on issue #5792: [SUPPORT] Update hudi table(using SparkSQL) failed when the column contains `null` value in other records

2022-08-29 Thread GitBox


microbearz commented on issue #5792:
URL: https://github.com/apache/hudi/issues/5792#issuecomment-1231055367

   @a0x I tried to reproduce with master branch, and failed at step 3.
   ` Cannot write 'note': NullType is incompatible with StringType;`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4739) Wrong value returned when length equals 1

2022-08-29 Thread wuwenchi (Jira)
wuwenchi created HUDI-4739:
--

 Summary: Wrong value returned when length equals 1
 Key: HUDI-4739
 URL: https://issues.apache.org/jira/browse/HUDI-4739
 Project: Apache Hudi
  Issue Type: Bug
Reporter: wuwenchi


In "KeyGenUtils#extractRecordKeys" function, it will return the value 
corresponding to the key, but when the length is equal to 1, the key and value 
are returned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] Zhangshunyu commented on issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


Zhangshunyu commented on issue #6528:
URL: https://github.com/apache/hudi/issues/6528#issuecomment-1231054611

   Hi @yihua, thanks for your reply, i will try once.
   BTW, what's the meaning of '002' here in writeClient.clean(instantTime + 
"002");
   i didnt find any instantTime + '002' in timeline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on issue #6514: [SUPPORT] Creating Hudi table with SparkSQL fails with FileNotFoundException

2022-08-29 Thread GitBox


KnightChess commented on issue #6514:
URL: https://github.com/apache/hudi/issues/6514#issuecomment-1231052745

   @functicons all version I think need. The doc desc is a bit confusing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #6528: [SUPPORT]How to clean the compacted .log and .hfiles in metadata?

2022-08-29 Thread GitBox


yihua commented on issue #6528:
URL: https://github.com/apache/hudi/issues/6528#issuecomment-1231051936

   @Zhangshunyu as @nsivabalan Hudi manages the compaction and cleaning for 
metadata table internally, as shown below in `HoodieBackedTableMetadataWriter` 
class:
   ```
   protected void cleanIfNecessary(BaseHoodieWriteClient writeClient, String 
instantTime) {
   Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()
   .getCommitTimeline().filterCompletedInstants().lastInstant();
   if (lastCompletedCompactionInstant.isPresent()
   && metadataMetaClient.getActiveTimeline().filterCompletedInstants()
   
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants()
 < 3) {
 // do not clean the log files immediately after compaction to give 
some buffer time for metadata table reader,
 // because there is case that the reader has prepared for the log file 
readers already before the compaction completes
 // while before/during the reading of the log files, the cleaning 
triggers and delete the reading files,
 // then a FileNotFoundException(for LogFormatReader) or NPE(for 
HFileReader) would throw.
   
 // 3 is a value that I think is enough for metadata table reader.
 return;
   }
   // Trigger cleaning with suffixes based on the same instant time. This 
ensures that any future
   // delta commits synced over will not have an instant time lesser than 
the last completed instant on the
   // metadata table.
   writeClient.clean(instantTime + "002");
 }
   ```
   
   Also as laid out above, the current logic prevents the cleaning from 
happening within 3 instants after the compaction in metadata table.  That could 
be the reason why you don't see cleaning, as 
`hoodie.metadata.compact.max.delta.commits` is set to 1.
   
   Could you try setting  `hoodie.metadata.compact.max.delta.commits` to 5 and 
see if that solves your problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess opened a new pull request, #6538: [MINOR] improve spark quick start doc

2022-08-29 Thread GitBox


KnightChess opened a new pull request, #6538:
URL: https://github.com/apache/hudi/pull/6538

   ### Change Logs
   
   #6405  #6514 , user will not use extend config when read current doc
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] dik111 commented on issue #6430: [SUPPORT]Flink SQL can't read complex type data Java client write

2022-08-29 Thread GitBox


dik111 commented on issue #6430:
URL: https://github.com/apache/hudi/issues/6430#issuecomment-1231050940

   I meet the same error in flink-sql-client


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   >