[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722939258



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieUpgradeDowngradeException;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a 
version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = 
LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = 
"hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  protected HoodieWriteConfig config;
+  protected HoodieEngineContext context;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  public UpgradeDowngrade(HoodieTableMetaClient metaClient, HoodieWriteConfig 
config, HoodieEngineContext context) {
+this.metaClient = metaClient;
+this.config = config;
+this.context = context;
+this.fs = metaClient.getFs();
+this.updatedPropsFilePath = new Path(metaClient.getMetaPath(), 
HOODIE_UPDATED_PROPERTY_FILE);
+this.propsFilePath = new Path(metaClient.getMetaPath(), 
HoodieTableConfig.HOODIE_PROPERTIES_FILE);
+  }
+
+  public boolean needsUpgradeOrDowngrade(HoodieTableVersion toVersion) {
+HoodieTableVersion fromVersion = 
metaClient.getTableConfig().getTableVersion();
+// Ensure no inflight commits & versions are same

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722938371



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/FlinkTaskContextSupplier.java
##
@@ -62,4 +64,9 @@ public RuntimeContext getFlinkRuntimeContext() {
 return Option.empty();
   }
 
+  @Override
+  public String getPartitionColumns(Properties props) {

Review comment:
   Got it.  I put `getPartitionColumns()` into `BaseUpgradeDowngradeHelper` 
and each engine should implement it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722937716



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/UpgradeDowngrade.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.upgrade;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.common.util.FileIOUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieUpgradeDowngradeException;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.Date;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a 
version change.
+ */
+public class UpgradeDowngrade {
+
+  private static final Logger LOG = 
LogManager.getLogger(UpgradeDowngrade.class);
+  public static final String HOODIE_UPDATED_PROPERTY_FILE = 
"hoodie.properties.updated";
+
+  private HoodieTableMetaClient metaClient;
+  protected HoodieWriteConfig config;
+  protected HoodieEngineContext context;
+  private transient FileSystem fs;
+  private Path updatedPropsFilePath;
+  private Path propsFilePath;
+
+  public UpgradeDowngrade(HoodieTableMetaClient metaClient, HoodieWriteConfig 
config, HoodieEngineContext context) {
+this.metaClient = metaClient;
+this.config = config;
+this.context = context;
+this.fs = metaClient.getFs();
+this.updatedPropsFilePath = new Path(metaClient.getMetaPath(), 
HOODIE_UPDATED_PROPERTY_FILE);
+this.propsFilePath = new Path(metaClient.getMetaPath(), 
HoodieTableConfig.HOODIE_PROPERTIES_FILE);
+  }
+
+  public boolean needsUpgradeOrDowngrade(HoodieTableVersion toVersion) {
+HoodieTableVersion fromVersion = 
metaClient.getTableConfig().getTableVersion();
+// Ensure no inflight commits & versions are same
+return toVersion.versionCode() != fromVersion.versionCode();
+  }
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version 
if need be.
+   * 
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in 
all write paths.
+   * 
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), 
and Hoodie version was upgraded to 0.6.0,
+   * Hoodie table version gets bumped to 1 and there are some upgrade steps 
need to be executed before doing any writes.
+   * Similarly, if a dataset was created using Hoodie version 0.6.0 or Hoodie 
table version 1 and then hoodie was downgraded
+   * to pre 0.6.0 or to Hoodie table version 0, then some downgrade steps need 
to be executed before proceeding w/ any writes.
+   * 
+   * On a high level, these are the steps performed
+   * 
+   * Step1 : Understand current hoodie table version and table version from 
hoodie.properties file
+   * Step2 : Delete any left over .updated from previous upgrade/downgrade
+   * Step3 : If version are different, perform upgrade/downgrade.
+   * Step4 : Copy hoodie.properties -> hoodie.properties.updated with the 
version updated
+   * Step6 : Rename hoodie.properties.updated to hoodie.properties
+   * 
+   *
+   * @param toVersion   version to which upgrade or downgrade has to be done.
+   * @param instantTime current instant time that should not be touched.
+   */
+  public void run(HoodieTableVersion toVersion, String instantTime) {
+try {
+  // Fetch version from property file and current version
+  HoodieTableVersion fromVersion = 
metaClient.getTableConfig().getTableVersion();
+  if (!needsUpgradeOrDowngrade(toVersion)) {
+return;
+  }
+
+  if (fs.exists(updatedPropsFilePath)) {
+// this can be left over .updated file from a failed attem

[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * afa26cb34e05fd49056b2e072457b3d92bacaa91 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2514)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#issuecomment-932639881


   
   ## CI report:
   
   * 3464963ae47ca4bddc4c57f5c1f9e14c2d87b318 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2516)
 
   * eec99609d3670d390be3bb8e1da6b6aacec10168 UNKNOWN
   * 21f5296fc6ec0e4b2de6e22762f3dd189c878e01 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722936410



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackUtils.java
##
@@ -120,21 +120,24 @@ static HoodieRollbackStat 
mergeRollbackStat(HoodieRollbackStat stat1, HoodieRoll
* Generate all rollback requests that we need to perform for rolling back 
this action without actually performing rolling back for MOR table type.
*
* @param instantToRollback Instant to Rollback
-   * @param table instance of {@link HoodieTable} to use.
-   * @param context instance of {@link HoodieEngineContext} to use.
+   * @param metaClientinstance of {@link HoodieTableMetaClient} to use.
+   * @param configWrite config.
+   * @param context   instance of {@link HoodieEngineContext} to use.
+   * @param fileSystemViewFile system view.
* @return list of rollback requests
*/
-  public static List 
generateRollbackRequestsUsingFileListingMOR(HoodieInstant instantToRollback, 
HoodieTable table, HoodieEngineContext context) throws IOException {
+  public static List 
generateRollbackRequestsUsingFileListingMOR(
+  HoodieInstant instantToRollback, HoodieTableMetaClient metaClient, 
HoodieWriteConfig config,
+  HoodieEngineContext context, TableFileSystemView.SliceView 
fileSystemView) throws IOException {

Review comment:
   The sliceView arguement is removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722936242



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/WriteMarkersFactory.java
##
@@ -34,24 +36,28 @@
   private static final Logger LOG = 
LogManager.getLogger(WriteMarkersFactory.class);
 
   /**
-   * @param markerType the type of markers to use
-   * @param table {@code HoodieTable} instance
+   * @param markerType  the type of markers to use
+   * @param metaClient  {@link HoodieTableMetaClient} instance to use
+   * @param config  Write config
+   * @param context {@link HoodieEngineContext} instance to use
* @param instantTime current instant time
-   * @return  {@code WriteMarkers} instance based on the {@code MarkerType}
+   * @return {@code WriteMarkers} instance based on the {@code MarkerType}
*/
-  public static WriteMarkers get(MarkerType markerType, HoodieTable table, 
String instantTime) {
+  public static WriteMarkers get(
+  MarkerType markerType, HoodieTableMetaClient metaClient, 
HoodieWriteConfig config,

Review comment:
   Right.  I revert the changes to the markers-related APIs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#discussion_r722935917



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackStrategy.java
##
@@ -63,7 +63,8 @@ public ListingBasedRollbackStrategy(HoodieTable table,
 table.getMetaClient().getBasePath(), config);
   } else {
 rollbackRequests = RollbackUtils
-.generateRollbackRequestsUsingFileListingMOR(instantToRollback, 
table, context);
+.generateRollbackRequestsUsingFileListingMOR(

Review comment:
   The reason to refactor the rollback methods was to avoid upgrade and 
downgrade helpers in the engine client package.  As discussed offline, I keep 
one helper class for engine-specific logic needed for the upgrade and downgrade 
actions.  Based on that I keep these rollback methods the same now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fengjian428 opened a new issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-05 Thread GitBox


fengjian428 opened a new issue #3755:
URL: https://github.com/apache/hudi/issues/3755


   Environment: Hudi 0.9  ,Hbase 1.4.12
   
   when I run delta streamer(version 0.9) to ingest data from kafka to a Hbase 
indexed mor table ,  after few commits, met this error when compaction running
   
![image](https://user-images.githubusercontent.com/4403474/136153476-785f7e62-4b26-4f0a-9b16-1ec7010da6b4.png)
   
In hdfs there is a file has same fileId and commit instant but different in 
the middle: 
hdfs://tl5/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab_newHbase/BR/2021-10/813800cd-1aaf-43ea-829f-4feef4a51cb3-0_19-2672-4427765_20211006051032.parquet
   below is 20211006051032.commit's content,  
   
   
![image](https://user-images.githubusercontent.com/4403474/136153507-ebc87179-d4fd-4737-9b07-2218f35667bb.png)
   
   What does 2672-4427765 and 2657-4368242 mean?  why they are mismatch and how 
can I fix this error?
   I tried recreate table , it happens again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#issuecomment-932639881


   
   ## CI report:
   
   * 3464963ae47ca4bddc4c57f5c1f9e14c2d87b318 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2516)
 
   * eec99609d3670d390be3bb8e1da6b6aacec10168 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#issuecomment-932639881


   
   ## CI report:
   
   * 3464963ae47ca4bddc4c57f5c1f9e14c2d87b318 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2516)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#issuecomment-932639881


   
   ## CI report:
   
   * 7ba73cb76255dd2dfc85a078cc03120595ad36dd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2488)
 
   * 3464963ae47ca4bddc4c57f5c1f9e14c2d87b318 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2516)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3743: [HUDI-2513] Refactor table upgrade and downgrade actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3743:
URL: https://github.com/apache/hudi/pull/3743#issuecomment-932639881


   
   ## CI report:
   
   * 7ba73cb76255dd2dfc85a078cc03120595ad36dd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2488)
 
   * 3464963ae47ca4bddc4c57f5c1f9e14c2d87b318 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi edited a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-10-05 Thread GitBox


zhedoubushishi edited a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-935558416


   Updated the description and resolved the conflicts. @xushiyan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi commented on pull request #3416: [HUDI-2362] Add external config file support

2021-10-05 Thread GitBox


zhedoubushishi commented on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-935558416


   Updated the description and resolved the conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-893712830


   
   ## CI report:
   
   * 2dd5fe4788faf920e276c5ecd755bf5950d55a20 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2199)
 
   * b1c19180582fa6f0139b1a897aba36834a5b408f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2515)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-893712830


   
   ## CI report:
   
   * 2dd5fe4788faf920e276c5ecd755bf5950d55a20 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2199)
 
   * b1c19180582fa6f0139b1a897aba36834a5b408f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 497ffa10f0e02708acc500487007dea82efb760f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2513)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 095d325883c4fcbf4acdc2fcab3065cbd8923690 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2510)
 
   * afa26cb34e05fd49056b2e072457b3d92bacaa91 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2514)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 095d325883c4fcbf4acdc2fcab3065cbd8923690 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2510)
 
   * afa26cb34e05fd49056b2e072457b3d92bacaa91 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guan-yq commented on issue #3657: [SUPPORT] Failed to insert data by flink-sql

2021-10-05 Thread GitBox


guan-yq commented on issue #3657:
URL: https://github.com/apache/hudi/issues/3657#issuecomment-935527868


   I have the same problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2510) QuickStart html page is showing 404

2021-10-05 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan updated HUDI-2510:
--
Status: In Progress  (was: Open)

> QuickStart html page is showing 404
> ---
>
> Key: HUDI-2510
> URL: https://issues.apache.org/jira/browse/HUDI-2510
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Rajesh Mahindra
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: pull-request-available
>
> Some external entities such as GCP are linking to 
> [https://hudi.apache.org/quickstart.html] for quick start. 
>  
> [https://cloud.google.com/blog/products/data-analytics/getting-started-with-new-table-formats-on-dataproc]
>  
> Can we create an alias to the actual quick start link?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2510) QuickStart html page is showing 404

2021-10-05 Thread Vinoth Govindarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Govindarajan updated HUDI-2510:
--
Status: Patch Available  (was: In Progress)

> QuickStart html page is showing 404
> ---
>
> Key: HUDI-2510
> URL: https://issues.apache.org/jira/browse/HUDI-2510
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Rajesh Mahindra
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: pull-request-available
>
> Some external entities such as GCP are linking to 
> [https://hudi.apache.org/quickstart.html] for quick start. 
>  
> [https://cloud.google.com/blog/products/data-analytics/getting-started-with-new-table-formats-on-dataproc]
>  
> Can we create an alias to the actual quick start link?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] fengjian428 closed issue #3749: [SUPPORT] is there solution to solve hbase data screw issue?

2021-10-05 Thread GitBox


fengjian428 closed issue #3749:
URL: https://github.com/apache/hudi/issues/3749


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 497ffa10f0e02708acc500487007dea82efb760f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2513)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-05 Thread GitBox


hudi-bot commented on pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#issuecomment-935480787


   
   ## CI report:
   
   * 497ffa10f0e02708acc500487007dea82efb760f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2510) QuickStart html page is showing 404

2021-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2510:
-
Labels: pull-request-available  (was: )

> QuickStart html page is showing 404
> ---
>
> Key: HUDI-2510
> URL: https://issues.apache.org/jira/browse/HUDI-2510
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Rajesh Mahindra
>Assignee: Vinoth Govindarajan
>Priority: Major
>  Labels: pull-request-available
>
> Some external entities such as GCP are linking to 
> [https://hudi.apache.org/quickstart.html] for quick start. 
>  
> [https://cloud.google.com/blog/products/data-analytics/getting-started-with-new-table-formats-on-dataproc]
>  
> Can we create an alias to the actual quick start link?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] YannByron opened a new pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-05 Thread GitBox


YannByron opened a new pull request #3754:
URL: https://github.com/apache/hudi/pull/3754


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on pull request #3753: [HUDI-2510] Added a quickstart redirect page to fix broken external links in GCP docs

2021-10-05 Thread GitBox


vingov commented on pull request #3753:
URL: https://github.com/apache/hudi/pull/3753#issuecomment-935475881


   @rmahindra123 -  I have explored the option to add a redirect in the config 
file, but for some reason, it's not allowing/working, hence I create a new 
quickstart page and did the redirection using JS. 
   
   Please review, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2510) QuickStart html page is showing 404

2021-10-05 Thread Vinoth Govindarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424790#comment-17424790
 ] 

Vinoth Govindarajan commented on HUDI-2510:
---

[~rmahindra] - I have explored the option to add a redirect in the config file, 
but for some reason its not allowing/working, hence I create a new quickstart 
page and did the redirection using JS.  This is the PR for the same, please 
review: https://github.com/apache/hudi/pull/3753

> QuickStart html page is showing 404
> ---
>
> Key: HUDI-2510
> URL: https://issues.apache.org/jira/browse/HUDI-2510
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Rajesh Mahindra
>Assignee: Vinoth Govindarajan
>Priority: Major
>
> Some external entities such as GCP are linking to 
> [https://hudi.apache.org/quickstart.html] for quick start. 
>  
> [https://cloud.google.com/blog/products/data-analytics/getting-started-with-new-table-formats-on-dataproc]
>  
> Can we create an alias to the actual quick start link?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov opened a new pull request #3753: Added a quickstart redirect page to fix broken external links in GCP docs

2021-10-05 Thread GitBox


vingov opened a new pull request #3753:
URL: https://github.com/apache/hudi/pull/3753


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *This PR adds a redirect page for quickstart so that the GCP docs links are 
handled correctly.*
   
   ## Brief change log
   
   *(for example:)*
 - *Added a new page for quickstart which directs to the latest docs 
quick-start page.*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [x] CI is green
   
- [x] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan merged pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-10-05 Thread GitBox


nsivabalan merged pull request #3590:
URL: https://github.com/apache/hudi/pull/3590


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on pull request #3630: [HUDI-313] NPE when select count start from a realtime table

2021-10-05 Thread GitBox


nsivabalan edited a comment on pull request #3630:
URL: https://github.com/apache/hudi/pull/3630#issuecomment-935187682


   @codope : Did you get a chance to repro this as vinoth suggested?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3630: [HUDI-313] NPE when select count start from a realtime table

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3630:
URL: https://github.com/apache/hudi/pull/3630#issuecomment-935187682


   @codope : Did you try reproducing this as vinoth suggested?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#issuecomment-935186881


   @prashantwason : can you please address the feedback when you get a chance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-10-05 Thread GitBox


nsivabalan edited a comment on pull request #3648:
URL: https://github.com/apache/hudi/pull/3648#issuecomment-935186342


   @fengjian428 : can you please address the feedback when you get a chance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3648:
URL: https://github.com/apache/hudi/pull/3648#issuecomment-935186342


   @fengjian428 : can you address the feedback when you get a chance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3751: [SUPPORT] Slow Write Speeds to Hudi

2021-10-05 Thread GitBox


nsivabalan commented on issue #3751:
URL: https://github.com/apache/hudi/issues/3751#issuecomment-935142290


   If you are doing lot of small writes, would recommend looking at MOR table 
type. curious to know why you folks went with COW ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3749: [SUPPORT] is there solution to solve hbase data screw issue?

2021-10-05 Thread GitBox


nsivabalan commented on issue #3749:
URL: https://github.com/apache/hudi/issues/3749#issuecomment-935138968


   I assume you got your answer from dev mailing list ? if yes, can we close 
this out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3747: [SUPPORT] Hive Sync process stuck and unable to exit

2021-10-05 Thread GitBox


nsivabalan commented on issue #3747:
URL: https://github.com/apache/hudi/issues/3747#issuecomment-935136426


   @codope can you follow up on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3737: how can we migrate a legacy COW table into MOR table

2021-10-05 Thread GitBox


nsivabalan commented on issue #3737:
URL: https://github.com/apache/hudi/issues/3737#issuecomment-935135814


   @vinothchandar @bvaradar @n3nash @bhasudha : any recommendations here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

2021-10-05 Thread GitBox


nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-935132854


   Can you set these configs as well
   hoodie.write.lock.wait_time_ms_between_retry=2000
   hoodie.write.lock.hivemetastore.uris= 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-934887878


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3752: [HUDI-1294][WIP] Adding inline read for Hfile log blocks

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3752:
URL: https://github.com/apache/hudi/pull/3752#issuecomment-934816311


   
   ## CI report:
   
   * c4ad910dcdb1d36ac4f59be2ef36c91f8dcde5cb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2511)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3752: [HUDI-1294][WIP] Adding inline read for Hfile log blocks

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3752:
URL: https://github.com/apache/hudi/pull/3752#issuecomment-934816311


   
   ## CI report:
   
   * c4ad910dcdb1d36ac4f59be2ef36c91f8dcde5cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2511)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3752: [HUDI-1294][WIP] Adding inline read for Hfile log blocks

2021-10-05 Thread GitBox


hudi-bot commented on pull request #3752:
URL: https://github.com/apache/hudi/pull/3752#issuecomment-934816311


   
   ## CI report:
   
   * c4ad910dcdb1d36ac4f59be2ef36c91f8dcde5cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1294) Implement inlining of HFile Data Blocks in metadata table log

2021-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1294:
-
Labels: pull-request-available  (was: )

> Implement inlining of HFile Data Blocks in metadata table log
> -
>
> Key: HUDI-1294
> URL: https://issues.apache.org/jira/browse/HUDI-1294
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Common Core, Performance
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #3752: [HUDI-1294][WIP] Adding inline read for Hfile log blocks

2021-10-05 Thread GitBox


nsivabalan opened a new pull request #3752:
URL: https://github.com/apache/hudi/pull/3752


   ## What is the purpose of the pull request
   
   - Adding inline read for Hfile log blocks
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 095d325883c4fcbf4acdc2fcab3065cbd8923690 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2510)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * fae23e34d01a8f6d6f56503ebfe6b28c1362c530 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2509)
 
   * 095d325883c4fcbf4acdc2fcab3065cbd8923690 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2510)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * fae23e34d01a8f6d6f56503ebfe6b28c1362c530 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2509)
 
   * 095d325883c4fcbf4acdc2fcab3065cbd8923690 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * fae23e34d01a8f6d6f56503ebfe6b28c1362c530 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2509)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * f744c0b01de34df48d29508f1cb142a9d16abc99 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2479)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2480)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2484)
 
   * fae23e34d01a8f6d6f56503ebfe6b28c1362c530 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2509)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * f744c0b01de34df48d29508f1cb142a9d16abc99 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2479)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2480)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2484)
 
   * fae23e34d01a8f6d6f56503ebfe6b28c1362c530 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-05 Thread GitBox


yihua commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r722394022



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -18,39 +18,258 @@
 
 package org.apache.hudi.table.action.compact;
 
+import org.apache.hudi.avro.model.HoodieCompactionOperation;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
+import org.apache.hudi.client.AbstractHoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieAccumulator;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.engine.TaskContextSupplier;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.CompactionOperation;
+import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieFileGroupId;
+import org.apache.hudi.common.model.HoodieLogFile;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.TableFileSystemView.SliceView;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.CompactionUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.IOUtils;
+import org.apache.hudi.table.HoodieCopyOnWriteTableOperation;
 import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.compact.strategy.CompactionStrategy;
+
+import org.apache.avro.Schema;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
 
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
 import java.util.Set;
+import java.util.stream.StreamSupport;
+
+import static java.util.stream.Collectors.toList;
 
 /**
  * A HoodieCompactor runs compaction on a hoodie table.
  */
-public interface HoodieCompactor 
extends Serializable {
+public abstract class HoodieCompactor 
implements Serializable {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieCompactor.class);
+
+  public abstract Schema getReaderSchema(HoodieWriteConfig config);
+
+  public abstract void updateReaderSchema(HoodieWriteConfig config, 
HoodieTableMetaClient metaClient);
+
+  public abstract void checkCompactionTimeline(

Review comment:
   Actually, the first line is different.  I rename the method as 
`handleCompactionTimeline()` and extract the second line.

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -18,39 +18,258 @@
 
 package org.apache.hudi.table.action.compact;
 
+import org.apache.hudi.avro.model.HoodieCompactionOperation;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
+import org.apache.hudi.client.AbstractHoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieAccumulator;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.engine.TaskContextSupplier;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.CompactionOperation;
+import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieFileGroupId;
+import org.apache.hudi.common.model.HoodieLogFile;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.TableFileSystemView.SliceView;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.CompactionUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
 

[GitHub] [hudi] MikeBuh opened a new issue #3751: [SUPPORT] Slow Write Speeds to Hudi

2021-10-05 Thread GitBox


MikeBuh opened a new issue #3751:
URL: https://github.com/apache/hudi/issues/3751


   Hello,
   We would like your assistance in order to better understand how to tweak our 
current setup to achieve better performance. Please find our case scenario 
below: 
   
   We are in the process of building a data lake using Hudi in order to allow 
us to have updated records with near real time availability. The source data 
resides on S3 and consists of multiple small files in Avro format, in essence 
there is a file for each Kafka message that is sent to us from an external 
source. To process the data and persist it to Hudi we have a Spark application 
running on EMR which consumes the data via structured streaming and does some 
basic filtering and conversions on it before performing an UPSERT operation. 
   
   Reading the data into the application, the schema is obtained through 
Confluent schema registry and AvroDeserializer. 
   `val inputDF = 
spark.readStream.format("avro").schema(schema).load(s"$dataSource/$topicName/")`
   
   After this some basic processing is performed to remove some of the fields 
and compute the key, calendardate, and eventtime from the input message. 
Eventually data is written to Hudi in the following way: 
   
   ```
   val hudiOptions: Map[String, String] = Map(
 HoodieWriteConfig.TABLE_NAME -> s"hudi_$topicName",
 DataSourceWriteOptions.TABLE_TYPE_OPT_KEY -> 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
 DataSourceWriteOptions.OPERATION_OPT_KEY -> 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
 DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "key",
 DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "calendardate",
 DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "eventtime",
 HoodieWriteConfig.UPSERT_PARALLELISM -> "10",
 HoodieWriteConfig.INSERT_PARALLELISM -> "10"
   )
   
   val hudiCompactOptions: Map[String, String] = Map(
 HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES -> "10485760",
 HoodieCompactionConfig.INLINE_COMPACT_PROP -> "true",
 HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP -> "15",
 HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP -> "12",
 HoodieCompactionConfig.MIN_COMMITS_TO_KEEP_PROP -> "15",
 HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP -> "18"
   )
   
   val processBatch: (DataFrame, Long) => Unit = (df,_) => {
 df.write
   .format("org.apache.hudi")
   .options(hudiOptions)
   .options(hudiCompactOptions)
   .mode(SaveMode.Append)
   .save(s"$destination/$topicName")
   }
   
   df.writeStream.trigger(Trigger.ProcessingTime("120 
seconds")).foreachBatch(processBatch).start()
   ```
   
   Execution Performance
   > Data Input: 40GB across 650K files
   > Hardware: 1 x Master 4C 32G | 4 x Core 4C 32G
   > Allocated Resources:  driver-memory 8g | executor-memory 4g | 
executor-cores 2 | num-executors 2
   > Execution Time: 2hrs+ 
   
   At the time being and following the above, we have reason to believe that 
higher performance could be achieved by tweaking various parameters in order to 
better suite the data input. 
   
   Last but not least, an instance of this app will run for each topic that we 
have and the heavier ones have around 6GB of data each day split between 95K 
files. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 6f2bc4169918ef270a1982605ffe6025be466f3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2508)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-05 Thread GitBox


hudi-bot edited a comment on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 6f2bc4169918ef270a1982605ffe6025be466f3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2508)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-05 Thread GitBox


hudi-bot commented on pull request #3750:
URL: https://github.com/apache/hudi/pull/3750#issuecomment-934504077


   
   ## CI report:
   
   * 6f2bc4169918ef270a1982605ffe6025be466f3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2530:
-
Labels: pull-request-available  (was: )

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #3750: [HUDI-2530] Adding async compaction support to integ test suite framework

2021-10-05 Thread GitBox


nsivabalan opened a new pull request #3750:
URL: https://github.com/apache/hudi/pull/3750


   ## What is the purpose of the pull request
   
   Fixed support for async compaction in integ test suite framework. 
   added yaml and properties file to assist in testing the async compaction. 
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1236) [UMBRELLA] Integ Test suite infra

2021-10-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1236:
--
Summary: [UMBRELLA] Integ Test suite infra   (was: [UMBRELLA] Long running 
test suite)

> [UMBRELLA] Integ Test suite infra 
> --
>
> Key: HUDI-1236
> URL: https://issues.apache.org/jira/browse/HUDI-1236
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: hudi-umbrellas
>
> Long running test suite that checks for correctness across all deployment 
> modes (batch/streaming) and writers (deltastreamer/spark) and readers (hive, 
> presto, spark)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2530:
--
Parent: HUDI-1236
Issue Type: Sub-task  (was: Improvement)

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-05 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2530:
-

Assignee: sivabalan narayanan

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2530) Add async compaction support to integ test suite infra

2021-10-05 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2530:
-

 Summary: Add async compaction support to integ test suite infra
 Key: HUDI-2530
 URL: https://issues.apache.org/jira/browse/HUDI-2530
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Testing
Reporter: sivabalan narayanan


Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3590:
URL: https://github.com/apache/hudi/pull/3590#issuecomment-934478106


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] novakov-alexey commented on issue #3647: [SUPPORT] Failed to read parquet file during upsert

2021-10-05 Thread GitBox


novakov-alexey commented on issue #3647:
URL: https://github.com/apache/hudi/issues/3647#issuecomment-934473332


   Row writer is giving huge speed up with bulk-insert. Looking forward for the 
proper fix to this issue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3691: [HUDI-2455] Adding spark_avro dependency to hudi-integ-test

2021-10-05 Thread GitBox


nsivabalan commented on pull request #3691:
URL: https://github.com/apache/hudi/pull/3691#issuecomment-934384095


   @YannByron : so apart from building/packing, were you looking for any 
assistance in running the test suite jobs a such ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ChaladiMohanVamsi commented on issue #3647: [SUPPORT] Failed to read parquet file during upsert

2021-10-05 Thread GitBox


ChaladiMohanVamsi commented on issue #3647:
URL: https://github.com/apache/hudi/issues/3647#issuecomment-934371202


   https://issues.apache.org/jira/browse/HUDI-2526
   
   I faced the similar issue. The community team found that legacy writer 
config is hard-coded to false, so it is ignoring external config.
   The workaround is to disable row writer while using bulk_insert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1657) build failed on AArch64, Fedora 33

2021-10-05 Thread Tejaswini edara (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424425#comment-17424425
 ] 

Tejaswini edara commented on HUDI-1657:
---

[ERROR] Failed to execute goal 
net.alchim31.maven:scala-maven-plugin:3.3.1:compile (scala-compile-first) on 
project hudi-spark-client: wrap: org.apache.commons.exec.ExecuteException: 
Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn  -rf :hudi-spark-client


My java 

```(base) ➜ hudi git:(master) ✗ java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)```


my mvn version

```Maven home: /usr/local/Cellar/maven/3.8.3/libexec
Java version: 1.8.0_202, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac" ```


Can someone help me to resolve this issue

> build failed on AArch64, Fedora 33 
> ---
>
> Key: HUDI-1657
> URL: https://issues.apache.org/jira/browse/HUDI-1657
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lutz Weischer
>Priority: Major
>  Labels: sev:triage, user-support-issues
>
> [jw@cn05 hudi]$ mvn package -DskipTests
> [INFO] Scanning for projects...
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-java-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink-client:jar:0.8.0-SNAPSHOT
> [WARNING] The expression ${parent.version} is deprecated. Please use 
> ${project.parent.version} instead.
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark/pom.xml, line 26, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark2_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark2_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-spark-datasource/hudi-spark2/pom.xml, line 24, 
> column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/hudi-utilities/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-spark-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-spark-bundle/pom.xml, line 26, column 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-utilities-bundle_${scala.binary.version}:0.8.0-SNAPSHOT, 
> /home/jw/apache/hudi/packaging/hudi-utilities-bundle/pom.xml, line 26, column 
> 15
> [WARNING]
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hudi:hudi-flink_2.11:jar:0.8.0-SNAPSHOT
> [WARNING] 'artifactId' contains an expression but should be a constant. @ 
> org.apache.hudi:hudi-flink_${scala.binary.version}:0.8.0

[GitHub] [hudi] novakov-alexey commented on issue #3647: [SUPPORT] Failed to read parquet file during upsert

2021-10-05 Thread GitBox


novakov-alexey commented on issue #3647:
URL: https://github.com/apache/hudi/issues/3647#issuecomment-934330091


   @xushiyan yes, it seems you are right. I can confirm that if I write to a 
table with bulk_insert and `hoodie.datasource.write.row.writer.enable=false`, 
then upsert is working afterwards. So this is a bug with row.writer flag.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2498) Support Hive sync to work with s3

2021-10-05 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2498:
-
Labels: sev:critical user-support-issues  (was: )

> Support Hive sync to work with s3
> -
>
> Key: HUDI-2498
> URL: https://issues.apache.org/jira/browse/HUDI-2498
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinay
>Assignee: Vinay
>Priority: Major
>  Labels: sev:critical, user-support-issues
>
> Currently Hive sync is not working with s3 out of the box, we have to add 
> dependencies explicitly to run_hive_sync script to make it work. 
>  
> It works fine on EMR but does not work with standalone clusters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2529) Flaky test: ITTestHoodieFlinkCompactor.testHoodieFlinkCompactor:88

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2529:
-
Attachment: 27.txt

> Flaky test: ITTestHoodieFlinkCompactor.testHoodieFlinkCompactor:88
> --
>
> Key: HUDI-2529
> URL: https://issues.apache.org/jira/browse/HUDI-2529
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
> Attachments: 27.txt
>
>
> {code:java}
> 2021-09-30T16:45:30.4276182Z 12557 [pool-15-thread-2] ERROR 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView  - Got error 
> running preferred function. Trying secondary
> 2021-09-30T16:45:30.4276903Z org.apache.hudi.exception.HoodieRemoteException: 
> Connect to 0.0.0.0:46865 [/0.0.0.0] failed: Connection refused (Connection 
> refused)
> 2021-09-30T16:45:30.4277581Z  at 
> org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestFileSlice(RemoteHoodieTableFileSystemView.java:297)
> 2021-09-30T16:45:30.4278221Z  at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:97)
> 2021-09-30T16:45:30.4278827Z  at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlice(PriorityBasedFileSystemView.java:252)
> 2021-09-30T16:45:30.4279399Z  at 
> org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:135)
> 2021-09-30T16:45:30.4279873Z  at 
> org.apache.hudi.io.HoodieAppendHandle.write(HoodieAppendHandle.java:390)
> 2021-09-30T16:45:30.4280347Z  at 
> org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:215)
> 2021-09-30T16:45:30.4280863Z  at 
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:96)
> 2021-09-30T16:45:30.4281447Z  at 
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
> 2021-09-30T16:45:30.4282039Z  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> 2021-09-30T16:45:30.4282624Z  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
> 2021-09-30T16:45:30.4283129Z  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2021-09-30T16:45:30.4283590Z  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2021-09-30T16:45:30.4284080Z  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2021-09-30T16:45:30.4284502Z  at java.lang.Thread.run(Thread.java:748)
> 2021-09-30T16:45:30.4298786Z Caused by: 
> org.apache.http.conn.HttpHostConnectException: Connect to 0.0.0.0:46865 
> [/0.0.0.0] failed: Connection refused (Connection refused)
> 2021-09-30T16:45:30.4299596Z  at 
> org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
> 2021-09-30T16:45:30.4300229Z  at 
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
> 2021-09-30T16:45:30.4300808Z  at 
> org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
> 2021-09-30T16:45:30.4301322Z  at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
> 2021-09-30T16:45:30.4301804Z  at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
> 2021-09-30T16:45:30.4302279Z  at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
> 2021-09-30T16:45:30.4302751Z  at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> 2021-09-30T16:45:30.4303239Z  at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> 2021-09-30T16:45:30.4303940Z  at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> 2021-09-30T16:45:30.4304463Z  at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
> 2021-09-30T16:45:30.4304983Z  at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> 2021-09-30T16:45:30.4305450Z  at 
> org.apache.http.client.fluent.Request.execute(Request.java:151)
> 2021-09-30T16:45:30.4306006Z  at 
> org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:172)
> 2021-09-30T16:45:30.4306671Z  at 
> org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestFileSlice(RemoteHoodieTableFileSystemView.java:293)
> 2021-09-30T16:45:30.4307194Z  ... 13 more
> 2021-09-30T16:45:30.4307537Z Caused by: java.net.ConnectException: Connection 
> refused (Connection refused)
> 2021-09-30T16:45:30.4307945Z  at 
> java.net.PlainSocketImpl.socke

[jira] [Updated] (HUDI-2529) Flaky test: ITTestHoodieFlinkCompactor.testHoodieFlinkCompactor:88

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2529:
-
Description: 
{code:java}
2021-09-30T16:45:30.4276182Z 12557 [pool-15-thread-2] ERROR 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView  - Got error 
running preferred function. Trying secondary
2021-09-30T16:45:30.4276903Z org.apache.hudi.exception.HoodieRemoteException: 
Connect to 0.0.0.0:46865 [/0.0.0.0] failed: Connection refused (Connection 
refused)
2021-09-30T16:45:30.4277581Zat 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestFileSlice(RemoteHoodieTableFileSystemView.java:297)
2021-09-30T16:45:30.4278221Zat 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:97)
2021-09-30T16:45:30.4278827Zat 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlice(PriorityBasedFileSystemView.java:252)
2021-09-30T16:45:30.4279399Zat 
org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:135)
2021-09-30T16:45:30.4279873Zat 
org.apache.hudi.io.HoodieAppendHandle.write(HoodieAppendHandle.java:390)
2021-09-30T16:45:30.4280347Zat 
org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:215)
2021-09-30T16:45:30.4280863Zat 
org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:96)
2021-09-30T16:45:30.4281447Zat 
org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
2021-09-30T16:45:30.4282039Zat 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
2021-09-30T16:45:30.4282624Zat 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
2021-09-30T16:45:30.4283129Zat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2021-09-30T16:45:30.4283590Zat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2021-09-30T16:45:30.4284080Zat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2021-09-30T16:45:30.4284502Zat java.lang.Thread.run(Thread.java:748)
2021-09-30T16:45:30.4298786Z Caused by: 
org.apache.http.conn.HttpHostConnectException: Connect to 0.0.0.0:46865 
[/0.0.0.0] failed: Connection refused (Connection refused)
2021-09-30T16:45:30.4299596Zat 
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
2021-09-30T16:45:30.4300229Zat 
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
2021-09-30T16:45:30.4300808Zat 
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
2021-09-30T16:45:30.4301322Zat 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
2021-09-30T16:45:30.4301804Zat 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
2021-09-30T16:45:30.4302279Zat 
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
2021-09-30T16:45:30.4302751Zat 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
2021-09-30T16:45:30.4303239Zat 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
2021-09-30T16:45:30.4303940Zat 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
2021-09-30T16:45:30.4304463Zat 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
2021-09-30T16:45:30.4304983Zat 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
2021-09-30T16:45:30.4305450Zat 
org.apache.http.client.fluent.Request.execute(Request.java:151)
2021-09-30T16:45:30.4306006Zat 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:172)
2021-09-30T16:45:30.4306671Zat 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getLatestFileSlice(RemoteHoodieTableFileSystemView.java:293)
2021-09-30T16:45:30.4307194Z... 13 more
2021-09-30T16:45:30.4307537Z Caused by: java.net.ConnectException: Connection 
refused (Connection refused)
2021-09-30T16:45:30.4307945Zat 
java.net.PlainSocketImpl.socketConnect(Native Method)
2021-09-30T16:45:30.4308362Zat 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
2021-09-30T16:45:30.4315903Zat 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
2021-09-30T16:45:30.4316643Zat 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
2021-09-30T16:45:30.4317099Zat 
java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
2021-09-30T16:45:30.4317496Zat java.net.Socket.connect(Soc

[jira] [Closed] (HUDI-2075) Flaky test: TestRowDataToHoodieFunction

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2075.

Resolution: Cannot Reproduce

Don't see this in Azure. Re-open if this came back.

> Flaky test: TestRowDataToHoodieFunction
> ---
>
> Key: HUDI-2075
> URL: https://issues.apache.org/jira/browse/HUDI-2075
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>
> At-least 10 occurrences 
> [ERROR] Failures: 
> [ERROR]   TestRowDataToHoodieFunction.testRateLimit:72 should process at 
> least 5 seconds ==> expected:  but was: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2078) Flaky test: TestCleaner

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2078.

Resolution: Cannot Reproduce

Don't see this in Azure. Re-open if this came back.

> Flaky test: TestCleaner
> ---
>
> Key: HUDI-2078
> URL: https://issues.apache.org/jira/browse/HUDI-2078
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>
> * TestCleaner.testKeepLatestCommits
>  * TestCleaner.testKeepLatestFileVersions:673 Must clean at least 1 file ==> 
> expected: <2> but was: <1>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2076) Flaky test: TestHoodieMultiTableDeltaStreamer

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2076.

Resolution: Cannot Reproduce

Don't see this in Azure. Re-open if this came back.

> Flaky test: TestHoodieMultiTableDeltaStreamer
> -
>
> Key: HUDI-2076
> URL: https://issues.apache.org/jira/browse/HUDI-2076
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Priority: Major
>
> At-least 4 occurrences
> [ERROR] Failures: 
> [ERROR]   
> TestHoodieMultiTableDeltaStreamer.testMultiTableExecutionWithKafkaSource:168 
> expected:  but was: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2077) Flaky test: TestHoodieDeltaStreamer

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2077:
-
Description: 
{code:java}
 [INFO] Results:8520[INFO] 8521[ERROR] Errors: 8522[ERROR]   
TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters:716->testUpsertsContinuousModeWithMultipleWriters:831->runJobsInParallel:940
 » Execution{code}
 Search "testUpsertsMORContinuousModeWithMultipleWriters" in the log file for 
details.

  was:
{code:java}
 [INFO] Results:8520[INFO] 8521[ERROR] Errors: 8522[ERROR]   
TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters:716->testUpsertsContinuousModeWithMultipleWriters:831->runJobsInParallel:940
 » Execution{code}
 
{code:java}
2021-10-01T15:38:36.7776781Z [ERROR] 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters
 Time elapsed: 57.945 s <<< ERROR! 2021-10-01T15:38:36.7778593Z 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7780175Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.runJobsInParallel(TestHoodieDeltaStreamer.java:926)
 2021-10-01T15:38:36.7781191Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsContinuousModeWithMultipleWriters(TestHoodieDeltaStreamer.java:818)
 2021-10-01T15:38:36.7782459Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters(TestHoodieDeltaStreamer.java:703)
 2021-10-01T15:38:36.7783719Z Caused by: java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7784928Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:923)
 2021-10-01T15:38:36.7786069Z Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7787955Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:921)
 2021-10-01T15:38:36.7789094Z Caused by: 
org.apache.hadoop.fs.FileAlreadyExistsException: 2021-10-01T15:38:36.7789863Z 
/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit for client 
127.0.0.1 already exists 2021-10-01T15:38:36.7790732Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
 2021-10-01T15:38:36.7791637Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
 2021-10-01T15:38:36.7793026Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
 2021-10-01T15:38:36.7794034Z at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
 2021-10-01T15:38:36.7795041Z at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
 2021-10-01T15:38:36.7796077Z at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 2021-10-01T15:38:36.7797974Z at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 2021-10-01T15:38:36.7798852Z at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) 
2021-10-01T15:38:36.7799527Z at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) 
2021-10-01T15:38:36.7800188Z at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) 
2021-10-01T15:38:36.7800789Z at 
java.security.AccessController.doPrivileged(Native Method) 
2021-10-01T15:38:36.7801386Z at 
javax.security.auth.Subject.doAs(Subject.java:422) 2021-10-01T15:38:36.7802258Z 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 2021-10-01T15:38:36.7802948Z at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) 
2021-10-01T15:38:36.7803676Z 2021-10-01T15:38:36.7804333Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:921)
 2021-10-01T15:38:36.7805070Z Caused by: org.apache.hadoop.ipc.RemoteException: 
2021-10-01T15:38:36.7805712Z 
/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit for client 
127.0.0.1 already exists 2021-10-01T15:38:36.7806633Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
 2021-10-01T15:38:36.7807422Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.ja

[jira] [Updated] (HUDI-2077) Flaky test: TestHoodieDeltaStreamer

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2077:
-
Attachment: 28.txt

> Flaky test: TestHoodieDeltaStreamer
> ---
>
> Key: HUDI-2077
> URL: https://issues.apache.org/jira/browse/HUDI-2077
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Major
> Attachments: 28.txt
>
>
> {code:java}
>  [INFO] Results:8520[INFO] 8521[ERROR] Errors: 8522[ERROR]   
> TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters:716->testUpsertsContinuousModeWithMultipleWriters:831->runJobsInParallel:940
>  » Execution{code}
>  Search "testUpsertsMORContinuousModeWithMultipleWriters" in the log file for 
> details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2077) Flaky test: TestHoodieDeltaStreamer

2021-10-05 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2077:
-
Description: 
{code:java}
 [INFO] Results:8520[INFO] 8521[ERROR] Errors: 8522[ERROR]   
TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters:716->testUpsertsContinuousModeWithMultipleWriters:831->runJobsInParallel:940
 » Execution{code}
 
{code:java}
2021-10-01T15:38:36.7776781Z [ERROR] 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters
 Time elapsed: 57.945 s <<< ERROR! 2021-10-01T15:38:36.7778593Z 
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7780175Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.runJobsInParallel(TestHoodieDeltaStreamer.java:926)
 2021-10-01T15:38:36.7781191Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsContinuousModeWithMultipleWriters(TestHoodieDeltaStreamer.java:818)
 2021-10-01T15:38:36.7782459Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters(TestHoodieDeltaStreamer.java:703)
 2021-10-01T15:38:36.7783719Z Caused by: java.lang.RuntimeException: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7784928Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:923)
 2021-10-01T15:38:36.7786069Z Caused by: 
org.apache.hudi.exception.HoodieIOException: Failed to create file 
hdfs://localhost:46579/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit
 2021-10-01T15:38:36.7787955Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:921)
 2021-10-01T15:38:36.7789094Z Caused by: 
org.apache.hadoop.fs.FileAlreadyExistsException: 2021-10-01T15:38:36.7789863Z 
/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit for client 
127.0.0.1 already exists 2021-10-01T15:38:36.7790732Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
 2021-10-01T15:38:36.7791637Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
 2021-10-01T15:38:36.7793026Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
 2021-10-01T15:38:36.7794034Z at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
 2021-10-01T15:38:36.7795041Z at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
 2021-10-01T15:38:36.7796077Z at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 2021-10-01T15:38:36.7797974Z at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
 2021-10-01T15:38:36.7798852Z at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) 
2021-10-01T15:38:36.7799527Z at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) 
2021-10-01T15:38:36.7800188Z at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) 
2021-10-01T15:38:36.7800789Z at 
java.security.AccessController.doPrivileged(Native Method) 
2021-10-01T15:38:36.7801386Z at 
javax.security.auth.Subject.doAs(Subject.java:422) 2021-10-01T15:38:36.7802258Z 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 2021-10-01T15:38:36.7802948Z at 
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) 
2021-10-01T15:38:36.7803676Z 2021-10-01T15:38:36.7804333Z at 
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$10(TestHoodieDeltaStreamer.java:921)
 2021-10-01T15:38:36.7805070Z Caused by: org.apache.hadoop.ipc.RemoteException: 
2021-10-01T15:38:36.7805712Z 
/user/vsts/continuous_mor_mulitwriter/.hoodie/20211001153821.commit for client 
127.0.0.1 already exists 2021-10-01T15:38:36.7806633Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2563)
 2021-10-01T15:38:36.7807422Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2450)
 2021-10-01T15:38:36.7808170Z at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2334)
 2021-10-01T15:38:36.7808949Z at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
 2021-10-01T15:38:36.7809836Z at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeP