[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-943066396


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-943061856


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-943060753


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 opened a new issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-13 Thread GitBox


t822876884 opened a new issue #3796:
URL: https://github.com/apache/hudi/issues/3796


   hudi 0.9.0
   flink 1.12.2
   
   ```java
   public static void main(String[] args) {
   //ENV
   StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
   env.setStateBackend(new FsStateBackend(YARN_CKP_PATH));
   env.enableCheckpointing(6);
   
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
   env.setParallelism(1);
   
   EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner()
   .inStreamingMode().build();
   StreamTableEnvironment tableEnvironment = 
StreamTableEnvironment.create(env, settings);
   
   FlinkKafkaConsumer consumer = new 
FlinkKafkaConsumer(KAFKA_TOPIC, new SimpleStringSchema(), 
kafkaProperties());
   consumer.setStartFromTimestamp(163379520L);
   
   //SOURCE
   DataStreamSource yarnDS = env
   .addSource(consumer)
   .setParallelism(8);
   
   
   DataStream dataDs = yarnDS.filter(new 
FilterFunction() {
   @Override
   public boolean filter(String value) throws Exception {
   String type = 
JSONObject.parseObject(value).getString("type");
   if (("yarn").equals(type)) {
   return true;
   }
   return false;
   }
   }).setParallelism(4)
   .map(new MapFunction() {
   @Override
   public YarnDataEntity map(String value) throws Exception {
   String data = 
JSONObject.parseObject(value).getString("data");
   YarnDataEntity yarnDataEntities = 
JSONObject.parseObject(data, YarnDataEntity.class);
   
yarnDataEntities.setDt(DateUtil.convertTimeByLong(yarnDataEntities.getStartedTime()));
   return yarnDataEntities;
   }
   }).setParallelism(8);
   
  Table dataDsYarn = tableEnvironment.fromDataStream(dataDs);
   
   //Table result = tableEnvironment.sqlQuery("SELECT * FROM " + 
dataDsYarn);
   //tableEnvironment.toAppendStream(result, 
YarnDataEntity.class).print();
   
   tableEnvironment.executeSql("CREATE TABLE big_data_analyse_yarn(" +
   " allocatedMB INT," +
   " allocatedVCores INT," +
   " amContainerLogs VARCHAR(200)," +
   " amHostHttpAddress VARCHAR(200)," +
   " amNodeLabelExpression VARCHAR(200)," +
   " amRPCAddress VARCHAR(20)," +
   " appNodeLabelExpression VARCHAR(200)," +
   " applicationTags VARCHAR(200)," +
   " applicationType VARCHAR(20)," +
   " clusterId BIGINT," +
   " clusterUsagePercentage FLOAT," +
   " diagnostics VARCHAR(200)," +
   " dt VARCHAR(20)," +
   " elapsedTime BIGINT, " +
   " finalStatus VARCHAR(200)," +
   " finishedTime BIGINT," +
   " id VARCHAR(200)," +
   " logAggregationStatus VARCHAR(200)," +
   " memorySeconds BIGINT, " +
   " name VARCHAR(200)," +
   " numAMContainerPreempted INT, " +
   " numNonAMContainerPreempted INT, " +
   " preemptedResourceMB int," +
   " preemptedResourceVCores BIGINT, " +
   " priority VARCHAR(200)," +
   " progress FLOAT, " +
   " queue VARCHAR(200)," +
   " queueUsagePercentage FLOAT, " +
   " runningContainers INT, " +
   " startedTime BIGINT," +
   " `state` VARCHAR(200)," +
   " trackingUI VARCHAR(200)," +
   " trackingUrl VARCHAR(200)," +
   " unmanagedApplication boolean," +
   " `user` VARCHAR(20)," +
   " vcoreSeconds BIGINT" +
   ")" +
   " PARTITIONED BY (dt)" +
   "WITH (" +
   "  'connector' = 'hudi'," +
   "  'path' = '"+ YARN_DATA_PATH +"'," +
   "  'write.tasks' = '8'," +
   "  'read.streaming.enabled'= 'true',  " +
   "  'table.type' = 'MERGE_ON_READ', " +
   "  'read.streaming.check-interval' = '30'," +
   "  'write.precombine.field' = 'dt'," +
   "  'hoodie.datasource.write.operation' = 'insert'," +
   "  'hoodie.datasource.write.recordkey.field' = 'id' " +
   " )");
   
   tableEnvironment.executeSql("insert into big_data_analyse_yarn 
select * from " + dataDsYarn);
   }
   ```
   
   ```
   org.ap

[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   * 1d3142cd55878ba81a358bf0b4d194779585bada Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   * 1d3142cd55878ba81a358bf0b4d194779585bada UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


garyli1019 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728655818



##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   hmm, not sure I understand when it is needed mean. Even users may use 
the same field for these two, but they have a completely different identity.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2551) Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2551.
--
Resolution: Fixed

Fixed via master branch: f897e6d73ebc26d32017774d452389023f53f742

> Support DefaultHoodieRecordPayload for flink
> 
>
> Key: HUDI-2551
> URL: https://issues.apache.org/jira/browse/HUDI-2551
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2551] Support DefaultHoodieRecordPayload for flink (#3792)

2021-10-13 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f897e6d  [HUDI-2551] Support DefaultHoodieRecordPayload for flink 
(#3792)
f897e6d is described below

commit f897e6d73ebc26d32017774d452389023f53f742
Author: Danny Chan 
AuthorDate: Thu Oct 14 13:46:53 2021 +0800

[HUDI-2551] Support DefaultHoodieRecordPayload for flink (#3792)
---
 .../hudi/execution/FlinkLazyInsertIterable.java|  2 +-
 .../apache/hudi/configuration/FlinkOptions.java|  2 +-
 .../hudi/sink/bootstrap/BootstrapOperator.java |  7 ++
 .../bootstrap/batch/BatchBootstrapOperator.java|  5 
 .../java/org/apache/hudi/util/StreamerUtil.java|  5 
 .../apache/hudi/table/HoodieDataSourceITCase.java  | 29 ++
 6 files changed, 48 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/execution/FlinkLazyInsertIterable.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/execution/FlinkLazyInsertIterable.java
index 8769f63..b0674b2 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/execution/FlinkLazyInsertIterable.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/execution/FlinkLazyInsertIterable.java
@@ -65,7 +65,7 @@ public class FlinkLazyInsertIterable extends Hood
 try {
   final Schema schema = new 
Schema.Parser().parse(hoodieConfig.getSchema());
   bufferedIteratorExecutor =
-  new 
BoundedInMemoryExecutor<>(hoodieConfig.getWriteBufferLimitBytes(), new 
IteratorBasedQueueProducer<>(inputItr), Option.of(getInsertHandler()), 
getTransformFunction(schema));
+  new 
BoundedInMemoryExecutor<>(hoodieConfig.getWriteBufferLimitBytes(), new 
IteratorBasedQueueProducer<>(inputItr), Option.of(getInsertHandler()), 
getTransformFunction(schema, hoodieConfig));
   final List result = bufferedIteratorExecutor.execute();
   assert result != null && !result.isEmpty() && 
!bufferedIteratorExecutor.isRemaining();
   return result;
diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java 
b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index 81bd517..b2359f4 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -100,7 +100,7 @@ public class FlinkOptions extends HoodieConfig {
   public static final ConfigOption METADATA_COMPACTION_DELTA_COMMITS 
= ConfigOptions
   .key("metadata.compaction.delta_commits")
   .intType()
-  .defaultValue(24)
+  .defaultValue(10)
   .withDescription("Max delta commits for metadata table to trigger 
compaction, default 24");
 
   // 
diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
 
b/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
index 3ac7aa1..0e7bb54 100644
--- 
a/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
+++ 
b/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
@@ -129,6 +129,13 @@ public class BootstrapOperator
 WriteOperationType.fromValue(conf.getString(FlinkOptions.OPERATION)),
 HoodieTableType.valueOf(conf.getString(FlinkOptions.TABLE_TYPE)));
 
+preLoadIndexRecords();
+  }
+
+  /**
+   * Load the index records before {@link #processElement}.
+   */
+  protected void preLoadIndexRecords() throws Exception {
 String basePath = hoodieTable.getMetaClient().getBasePath();
 int taskID = getRuntimeContext().getIndexOfThisSubtask();
 LOG.info("Start loading records in table {} into the index state, taskId = 
{}", basePath, taskID);
diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/batch/BatchBootstrapOperator.java
 
b/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/batch/BatchBootstrapOperator.java
index ac4c2b1..258f884 100644
--- 
a/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/batch/BatchBootstrapOperator.java
+++ 
b/hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/batch/BatchBootstrapOperator.java
@@ -57,6 +57,11 @@ public class BatchBootstrapOperator
   }
 
   @Override
+  protected void preLoadIndexRecords() {
+// no operation
+  }
+
+  @Override
   @SuppressWarnings("unchecked")
   public void processElement(StreamRecord element) throws Exception {
 final HoodieRecord record = (HoodieRecord) element.getValue();
diff --git a/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java 
b/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
index cfa2980..7fb550d 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.ja

[GitHub] [hudi] danny0405 merged pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


danny0405 merged pull request #3792:
URL: https://github.com/apache/hudi/pull/3792


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728651340



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -66,6 +90,170 @@
 return HoodieRealtimeInputFormatUtils.getRealtimeSplits(job, fileSplits);
   }
 
+  /**
+   * Keep the logical of mor_incr_view as same as spark datasource.
+   * Step1: Get list of commits to be fetched based on start commit and max 
commits(for snapshot max commits is -1).

Review comment:
   `logical` is an adjective, please use noun `logic` instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942232592


   
   ## CI report:
   
   * 677cbef4d404808777dad21fc19e68b332b0ef0b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gaoshihang commented on issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


gaoshihang commented on issue #3790:
URL: https://github.com/apache/hudi/issues/3790#issuecomment-942939360


   > No, spark also needs this option but with a different option key.
   
   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


danny0405 commented on issue #3790:
URL: https://github.com/apache/hudi/issues/3790#issuecomment-942938716


   No, spark also needs this option but with a different option key.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gaoshihang commented on issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


gaoshihang commented on issue #3790:
URL: https://github.com/apache/hudi/issues/3790#issuecomment-942932423


   Please ask another question, is this unique to Flink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gaoshihang closed issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


gaoshihang closed issue #3790:
URL: https://github.com/apache/hudi/issues/3790


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gaoshihang commented on issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


gaoshihang commented on issue #3790:
URL: https://github.com/apache/hudi/issues/3790#issuecomment-942929519


   Thank you very much! resolve my problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728620696



##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   Yes, we can add that when it is needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * 1b66ae85aed5f2f0d1542323be74216d062a5ca6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2612)
 
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * 1b66ae85aed5f2f0d1542323be74216d062a5ca6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2612)
 
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942232592


   
   ## CI report:
   
   * ba0dc0a6169de8b9a2c6ee9659ee9b7750d4d5b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2621)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2634)
 
   * 677cbef4d404808777dad21fc19e68b332b0ef0b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2636)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942232592


   
   ## CI report:
   
   * ba0dc0a6169de8b9a2c6ee9659ee9b7750d4d5b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2621)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2634)
 
   * 677cbef4d404808777dad21fc19e68b332b0ef0b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * faf3186897f0a7ab71d63cf5736ab45ae49347cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2613)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2615)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942232592


   
   ## CI report:
   
   * ba0dc0a6169de8b9a2c6ee9659ee9b7750d4d5b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2621)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


garyli1019 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728607020



##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   then we need an EVENTTIME_FIELD right? we can set default as 
PRECOMBINE_FIELD, but I think in some cases users may set two separate fields 
for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #3790: [SUPPORT]Flink-cdc write to COW hudi table record duplicate

2021-10-13 Thread GitBox


danny0405 commented on issue #3790:
URL: https://github.com/apache/hudi/issues/3790#issuecomment-942910695


   You need to set up option `write.insert.drop.duplicates` explicitly to 
deduplicate before merge, see document: 
https://www.yuque.com/docs/share/01c98494-a980-414c-9c45-152023bf3c17?#pqEWP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728601033



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -129,6 +129,13 @@ public void initializeState(StateInitializationContext 
context) throws Exception
 WriteOperationType.fromValue(conf.getString(FlinkOptions.OPERATION)),
 HoodieTableType.valueOf(conf.getString(FlinkOptions.TABLE_TYPE)));
 
+preLoadIndexRecords();
+  }
+
+  /**
+   * Load the index records before {@link #processElement}.
+   */
+  protected void preLoadIndexRecords() throws Exception {

Review comment:
   Yes, just to fix the duplicate index loading of the test cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728600818



##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   Yes, you are with, as a default, we may use 
`FlinkOptions.PRECOMBINE_FIELD` as event time field.

##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   Yes, you are right, as a default, we may use 
`FlinkOptions.PRECOMBINE_FIELD` as event time field.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3787: [HUDI-2548] Flink streaming reader misses the rolling over file handles

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3787:
URL: https://github.com/apache/hudi/pull/3787#discussion_r728599250



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java
##
@@ -139,7 +139,7 @@ public void processElement(I value, ProcessFunction.Context ctx, Coll
   public void close() {
 if (this.writeClient != null) {
   this.writeClient.cleanHandlesGracefully();
-  this.writeClient.close();
+  // this.writeClient.close();

Review comment:
   Because the embedded timeline server is JVM process singleton, if one 
thread starts to close the server, the other threads that needs the server 
would fall into exception.
   
   Would fix the server as a driver service in following PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas commented on a change in pull request #3787: [HUDI-2548] Flink streaming reader misses the rolling over file handles

2021-10-13 Thread GitBox


SteNicholas commented on a change in pull request #3787:
URL: https://github.com/apache/hudi/pull/3787#discussion_r728579081



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##
@@ -142,6 +143,60 @@ public WriteOperationType getOperationType() {
 return fileGroupIdToFullPaths;
   }
 
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit.
+   *
+   * @param basePath The base path
+   * @return the file full path to file status mapping
+   */
+  public Map getFullPathToFileStatus(String basePath) {
+Map fullPathToFileStatus = new HashMap<>();
+for (List stats : getPartitionToWriteStats().values()) {
+  // Iterate through all the written files.
+  for (HoodieWriteStat stat : stats) {
+String relativeFilePath = stat.getPath();
+Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;

Review comment:
   IMO, this could directly check whether relativeFilePath is null to put 
fileStatus into fullPathToFileStatus.

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##
@@ -142,6 +143,60 @@ public WriteOperationType getOperationType() {
 return fileGroupIdToFullPaths;
   }
 
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit.
+   *
+   * @param basePath The base path
+   * @return the file full path to file status mapping
+   */
+  public Map getFullPathToFileStatus(String basePath) {
+Map fullPathToFileStatus = new HashMap<>();
+for (List stats : getPartitionToWriteStats().values()) {
+  // Iterate through all the written files.
+  for (HoodieWriteStat stat : stats) {
+String relativeFilePath = stat.getPath();
+Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;
+if (fullPath != null) {
+  FileStatus fileStatus = new FileStatus(stat.getFileSizeInBytes(), 
false, 0, 0,
+  0, fullPath);
+  fullPathToFileStatus.put(fullPath.getName(), fileStatus);
+}
+  }
+}
+return fullPathToFileStatus;
+  }
+
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit by file group ID.
+   *
+   * Note: different with {@link #getFullPathToFileStatus(String)},
+   * only the latest commit file for a file group is returned,
+   * this is an optimization for COPY_ON_WRITE table to eliminate legacy files 
for filesystem view.
+   *
+   * @param basePath The base path
+   * @return the file ID to file status mapping
+   */
+  public Map getFileIdToFileStatus(String basePath) {
+Map fileIdToFileStatus = new HashMap<>();
+for (List stats : getPartitionToWriteStats().values()) {
+  // Iterate through all the written files.
+  for (HoodieWriteStat stat : stats) {
+String relativeFilePath = stat.getPath();
+Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;

Review comment:
   Ditto.

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java
##
@@ -139,7 +139,7 @@ public void processElement(I value, ProcessFunction.Context ctx, Coll
   public void close() {
 if (this.writeClient != null) {
   this.writeClient.cleanHandlesGracefully();
-  this.writeClient.close();
+  // this.writeClient.close();

Review comment:
   Why doesn't this invoke the `close` method of the write client?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r728597616



##
File path: hudi-common/src/main/java/org/apache/hudi/common/data/HoodieData.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.data;
+
+import org.apache.hudi.common.function.SerializableFunction;
+
+import java.io.Serializable;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * An abstraction for a data collection of objects in type T to store the 
reference
+ * and do transformation.
+ *
+ * @param  type of object.
+ */
+public abstract class HoodieData implements Serializable {

Review comment:
   `HoodieCollection` seems a better name because it is mainly used to avid 
the Java annotation diff.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r728594783



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
##
@@ -383,7 +388,13 @@ public void completeCompaction(
   protected List compact(String compactionInstantTime, boolean 
shouldComplete) {
 // only used for metadata table, the compaction happens in single thread
 try {
-  List writeStatuses = 
FlinkCompactHelpers.compact(compactionInstantTime, this);
+  RunCompactionActionExecutor compactionExecutor = new 
RunCompactionActionExecutor(
+  context, config, getHoodieTable(), compactionInstantTime, this,

Review comment:
   A better way is moving the `new RunCompactionActionExecutor` execution 
into the `HoodieFlinkTable` impl `#compact`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3668:
URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741


   
   ## CI report:
   
   * aaf33e3a28680ab5febca7df70937ce543619a94 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


danny0405 commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r728587644



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/HoodieWriteMetadata.java
##
@@ -46,6 +46,36 @@
   public HoodieWriteMetadata() {
   }
 
+  /**
+   * Clones the write metadata with transformed write statuses.
+   *
+   * @param transformedWriteStatuses transformed write statuses
+   * @param   type of transformed write statuses
+   * @return Cloned {@link HoodieWriteMetadata} instance
+   */
+  public  HoodieWriteMetadata clone(T transformedWriteStatuses) {
+HoodieWriteMetadata newMetadataInstance = new HoodieWriteMetadata<>();
+newMetadataInstance.setWriteStatuses(transformedWriteStatuses);
+if (indexLookupDuration.isPresent()) {

Review comment:
   We should find a way to eliminate the metadata clone, which is hard to 
maintain and buggy.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


garyli1019 commented on a change in pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#discussion_r728585655



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java
##
@@ -129,6 +129,13 @@ public void initializeState(StateInitializationContext 
context) throws Exception
 WriteOperationType.fromValue(conf.getString(FlinkOptions.OPERATION)),
 HoodieTableType.valueOf(conf.getString(FlinkOptions.TABLE_TYPE)));
 
+preLoadIndexRecords();
+  }
+
+  /**
+   * Load the index records before {@link #processElement}.
+   */
+  protected void preLoadIndexRecords() throws Exception {

Review comment:
   not related to this PR?

##
File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
##
@@ -189,6 +190,10 @@ public static HoodieWriteConfig 
getHoodieClientConfig(Configuration conf) {
 .enable(conf.getBoolean(FlinkOptions.METADATA_ENABLED))
 
.withMaxNumDeltaCommitsBeforeCompaction(conf.getInteger(FlinkOptions.METADATA_COMPACTION_DELTA_COMMITS))
 .build())
+.withPayloadConfig(HoodiePayloadConfig.newBuilder()
+
.withPayloadOrderingField(conf.getString(FlinkOptions.PRECOMBINE_FIELD))
+
.withPayloadEventTimeField(conf.getString(FlinkOptions.RECORD_KEY_FIELD))

Review comment:
   Eventtime key is actually different from record key. It should be a 
timestamp format. Should we add another option?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942232592


   
   ## CI report:
   
   * ba0dc0a6169de8b9a2c6ee9659ee9b7750d4d5b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2621)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2634)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Neo966 opened a new issue #3795: [SUPPORT] hive query hudi error

2021-10-13 Thread GitBox


Neo966 opened a new issue #3795:
URL: https://github.com/apache/hudi/issues/3795


   hive version:2.1.1
   flink:1.12.2 scala:2.11
   
   1、hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
select count(*) from xxx; //103828120, It's not right. actual number is 
18874368.
   
   2、hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
select count(*) from xxx; //Error: Error while processing statement: 
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 
org/apache/hadoop/hive/common/StringInternUtils (state=08S01,code=-101)
   
   3、hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
select count(*) from xxx; //18874368
select * from xxx limit 18874360, 10; //it's works, display the last 8 
records normally.
select count(*) from xxx where name = 'lisi'; //2097152
select * from xxx where name = 'lisi' limit 2097150, 10; //the result 
error, no record return, should return last 2 record.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * faf3186897f0a7ab71d63cf5736ab45ae49347cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2613)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2615)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2633)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #3792: [HUDI-2551] Support DefaultHoodieRecordPayload for flink

2021-10-13 Thread GitBox


danny0405 commented on pull request #3792:
URL: https://github.com/apache/hudi/pull/3792#issuecomment-942892659


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-13 Thread GitBox


zhangyue19921010 commented on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-942892121


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2548) Flink streaming reader misses the rolling over file handles

2021-10-13 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2548.
--
Resolution: Fixed

Fixed via master branch: abf3e3fe71cd92a4129cf110a5206fbcfb3b1ae2

> Flink streaming reader misses the rolling over file handles
> ---
>
> Key: HUDI-2548
> URL: https://issues.apache.org/jira/browse/HUDI-2548
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-10-13 Thread GitBox


zhangyue19921010 removed a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-942114128


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (cff384d -> abf3e3f)

2021-10-13 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from cff384d  [HUDI-2552] Fixing some test failures to unblock broken CI 
master (#3793)
 add abf3e3f  [HUDI-2548] Flink streaming reader misses the rolling over 
file handles (#3787)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  2 +-
 .../hudi/common/model/HoodieCommitMetadata.java| 59 -
 .../table/timeline/HoodieArchivedTimeline.java | 13 ++--
 .../apache/hudi/configuration/FlinkOptions.java|  4 +-
 .../org/apache/hudi/sink/StreamWriteFunction.java  |  5 +-
 .../hudi/sink/StreamWriteOperatorCoordinator.java  |  8 ++-
 .../sink/partitioner/profile/WriteProfile.java |  2 +-
 .../sink/partitioner/profile/WriteProfiles.java| 77 --
 .../apache/hudi/source/IncrementalInputSplits.java | 23 ++-
 .../hudi/source/StreamReadMonitoringFunction.java  |  2 +-
 .../apache/hudi/streamer/FlinkStreamerConfig.java  |  4 +-
 .../java/org/apache/hudi/util/StreamerUtil.java| 12 +++-
 .../org/apache/hudi/sink/TestWriteCopyOnWrite.java | 13 ++--
 .../apache/hudi/table/HoodieDataSourceITCase.java  | 38 +--
 .../hudi/hadoop/utils/HoodieInputFormatUtils.java  | 72 ++--
 .../hudi/MergeOnReadIncrementalRelation.scala  | 17 ++---
 16 files changed, 225 insertions(+), 126 deletions(-)


[GitHub] [hudi] danny0405 merged pull request #3787: [HUDI-2548] Flink streaming reader misses the rolling over file handles

2021-10-13 Thread GitBox


danny0405 merged pull request #3787:
URL: https://github.com/apache/hudi/pull/3787


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3668:
URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741


   
   ## CI report:
   
   * 89dab78876b2512aa4967ced70da27f6fdb46b14 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2619)
 
   * aaf33e3a28680ab5febca7df70937ce543619a94 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2632)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas removed a comment on pull request #3779: [HUDI-2503] HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2021-10-13 Thread GitBox


SteNicholas removed a comment on pull request #3779:
URL: https://github.com/apache/hudi/pull/3779#issuecomment-942881620


   > stored
   
   @danny0405 , IMO, I could do some work to share the flink index which is 
stored in the state for this pull request. 
   What do you think about?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas commented on pull request #3779: [HUDI-2503] HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2021-10-13 Thread GitBox


SteNicholas commented on pull request #3779:
URL: https://github.com/apache/hudi/pull/3779#issuecomment-942881963


   > Please do not merge before we can share the flink index which is stored in 
the state.
   
   @danny0405 , IMO, I could do some work to share the flink index which is 
stored in the state for this pull request. 
   What do you think about?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas edited a comment on pull request #3779: [HUDI-2503] HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2021-10-13 Thread GitBox


SteNicholas edited a comment on pull request #3779:
URL: https://github.com/apache/hudi/pull/3779#issuecomment-942881620


   > stored
   
   @danny0405 , IMO, I could do some work to share the flink index which is 
stored in the state for this pull request. 
   What do you think about?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas removed a comment on pull request #3779: [HUDI-2503] HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2021-10-13 Thread GitBox


SteNicholas removed a comment on pull request #3779:
URL: https://github.com/apache/hudi/pull/3779#issuecomment-942881691


   > stored
   
   @danny0405 , IMO, I could do some work to share the flink index which is 
stored in the state for this pull request.
   What do you think about?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] SteNicholas commented on pull request #3779: [HUDI-2503] HoodieFlinkWriteClient supports to allow parallel writing to tables using Locking service

2021-10-13 Thread GitBox


SteNicholas commented on pull request #3779:
URL: https://github.com/apache/hudi/pull/3779#issuecomment-942881620






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-942878688


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2549) Exceptions when using second writer into Hudi table managed by DeltaStreamer

2021-10-13 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428561#comment-17428561
 ] 

sivabalan narayanan commented on HUDI-2549:
---

Hey Dave. in my local set up, I did not have deltastreamer doing 1 commit per 
30 seconds. also, my spark writer was fast and start time did not interfere w/ 
deltastreamer. but I ensured that deltastreamer did not have any issues w/ 
checkpoint if there was some spark writer interleaved. 

> Exceptions when using second writer into Hudi table managed by DeltaStreamer
> 
>
> Key: HUDI-2549
> URL: https://issues.apache.org/jira/browse/HUDI-2549
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Dave Hagman
>Assignee: Dave Hagman
>Priority: Critical
>  Labels: multi-writer, sev:critical
> Fix For: 0.10.0
>
>
> When running the DeltaStreamer along with a second spark datasource writer 
> (with [ZK-based OCC 
> enabled|https://hudi.apache.org/docs/concurrency_control#enabling-multi-writing]
>  we receive the following exception (which haults the spark datasource 
> writer). This occurs following warnings of timeline inconsistencies:
>  
> {code:java}
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ending with 
> transaction owner Option{val=[==>20211007170717__commit__INFLIGHT]}
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASING lock 
> atZkBasePath = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASED lock atZkBasePath 
> = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ended
> Exception in thread "main" java.lang.IllegalArgumentException
> at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:414)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:395)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:153)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:218)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:190)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:617)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:274)
> at 
> org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
> at 
> org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
> at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
> at 
> org.apache.spark

[GitHub] [hudi] nsivabalan commented on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942876773


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3668:
URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741


   
   ## CI report:
   
   * 89dab78876b2512aa4967ced70da27f6fdb46b14 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2619)
 
   * aaf33e3a28680ab5febca7df70937ce543619a94 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * 31852dac3234f80b094392197a34ac5704f2e784 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * f4b16e728f180c9fc4655ae052bb89b2f6a1ff8b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2630)
 
   * 31852dac3234f80b094392197a34ac5704f2e784 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2631)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * f4b16e728f180c9fc4655ae052bb89b2f6a1ff8b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2630)
 
   * 31852dac3234f80b094392197a34ac5704f2e784 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2553:
-
Labels: pull-request-available  (was: )

> Re-enable max delta commits for metadata table to 10
> 
>
> Key: HUDI-2553
> URL: https://issues.apache.org/jira/browse/HUDI-2553
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: pull-request-available
>
> our CI was broken recently. hence reverted couple of tests and the default 
> value for max delta commits for metadata table. 
> [https://github.com/apache/hudi/pull/3793]
>  
> Please set it back to 10. Lets re-run CI for the patch few times to ensure 
> there are no flakiness.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * f4b16e728f180c9fc4655ae052bb89b2f6a1ff8b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2630)
 
   * 31852dac3234f80b094392197a34ac5704f2e784 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 333c80ea94b4ed248108d68357e5729bd6613104 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2553:
-
Status: In Progress  (was: Open)

> Re-enable max delta commits for metadata table to 10
> 
>
> Key: HUDI-2553
> URL: https://issues.apache.org/jira/browse/HUDI-2553
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>
> our CI was broken recently. hence reverted couple of tests and the default 
> value for max delta commits for metadata table. 
> [https://github.com/apache/hudi/pull/3793]
>  
> Please set it back to 10. Lets re-run CI for the patch few times to ensure 
> there are no flakiness.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2553:
-
Status: Patch Available  (was: In Progress)

> Re-enable max delta commits for metadata table to 10
> 
>
> Key: HUDI-2553
> URL: https://issues.apache.org/jira/browse/HUDI-2553
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>
> our CI was broken recently. hence reverted couple of tests and the default 
> value for max delta commits for metadata table. 
> [https://github.com/apache/hudi/pull/3793]
>  
> Please set it back to 10. Lets re-run CI for the patch few times to ensure 
> there are no flakiness.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2532) Set right default value for max delta commits for compaction in metadata table

2021-10-13 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2532:
-
Status: Closed  (was: Patch Available)

> Set right default value for max delta commits for compaction in metadata 
> table 
> ---
>
> Key: HUDI-2532
> URL: https://issues.apache.org/jira/browse/HUDI-2532
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Set right default value of 10 for max delta commits for compaction in 
> metadata table. As of now, its set as 24 which is huge. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2532) Set right default value for max delta commits for compaction in metadata table

2021-10-13 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2532:
-
Status: Patch Available  (was: In Progress)

> Set right default value for max delta commits for compaction in metadata 
> table 
> ---
>
> Key: HUDI-2532
> URL: https://issues.apache.org/jira/browse/HUDI-2532
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Set right default value of 10 for max delta commits for compaction in 
> metadata table. As of now, its set as 24 which is huge. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on a change in pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#discussion_r728518751



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java
##
@@ -126,23 +130,21 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
   }
 
   @Override
-  public Map getAllFilesInPartitions(List 
partitionPaths)
+  public Map getAllFilesInPartitions(List 
partitions)
   throws IOException {
 if (enabled) {
-  Map partitionsFilesMap = new HashMap<>();
-
   try {
-for (String partitionPath : partitionPaths) {
-  partitionsFilesMap.put(partitionPath, fetchAllFilesInPartition(new 
Path(partitionPath)));
-}
+// need to understand why we did not make bulk get before

Review comment:
   from what I infer, with HoodieMergedLogRecordScanner, we first read all 
records from all log blocks and prepare a hash map of records(record key to 
HoodieRecord). And we don't do seek based read prior to this patch and so we do 
read all log records from all log blocks. so was bit curious. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2532] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * f4b16e728f180c9fc4655ae052bb89b2f6a1ff8b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2630)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3794: [HUDI-2532] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


hudi-bot commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * f4b16e728f180c9fc4655ae052bb89b2f6a1ff8b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-942793206


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec opened a new pull request #3794: [HUDI-2532] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-13 Thread GitBox


manojpec opened a new pull request #3794:
URL: https://github.com/apache/hudi/pull/3794


   ## What is the purpose of the pull request
   
   Setting the max delta commits default config to 10 (previously it was 24) to 
trigger the compaction in metadata table quicker than before.
   
   The previous change for this https://github.com/apache/hudi/pull/3784 is 
suspected for breaking CI, so re-doing this change to let CI catch the 
flakiness if any.
   
   ## Brief change log
   
   * Updated the default config value in HoodieMetadataConfig.java
   
   ## Verify this pull request
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 3b08956e6b53ba25be60a659ac6d28d147d9a77b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2625)
 
   * 333c80ea94b4ed248108d68357e5729bd6613104 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2629)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 3b08956e6b53ba25be60a659ac6d28d147d9a77b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2625)
 
   * 333c80ea94b4ed248108d68357e5729bd6613104 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2555) Fix flaky FlinkCompaction integration test

2021-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2555:
--
Description: 
Recently CI was broken and had to revert some suspicious tests. 
[https://github.com/apache/hudi/pull/3793]

We need to fix it and re-enable them back. 

[ITTestHoodieFlinkCompactor.java|https://github.com/apache/hudi/pull/3793/files#diff-f15b4ec18c40c9494e62ae73aa4b79beeafd1a5fa185b6ec6a7044fa6ed9e1fd]
 

testHoodieFlinkCompactor

  was:
Recently CI was broken and had to revert some suspicious tests. 

We need to fix it and re-enable them back. 

[ITTestHoodieFlinkCompactor.java|https://github.com/apache/hudi/pull/3793/files#diff-f15b4ec18c40c9494e62ae73aa4b79beeafd1a5fa185b6ec6a7044fa6ed9e1fd]
 

testHoodieFlinkCompactor


> Fix flaky FlinkCompaction integration test
> --
>
> Key: HUDI-2555
> URL: https://issues.apache.org/jira/browse/HUDI-2555
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Danny Chen
>Priority: Major
>
> Recently CI was broken and had to revert some suspicious tests. 
> [https://github.com/apache/hudi/pull/3793]
> We need to fix it and re-enable them back. 
> [ITTestHoodieFlinkCompactor.java|https://github.com/apache/hudi/pull/3793/files#diff-f15b4ec18c40c9494e62ae73aa4b79beeafd1a5fa185b6ec6a7044fa6ed9e1fd]
>  
> testHoodieFlinkCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2553:
--
Description: 
our CI was broken recently. hence reverted couple of tests and the default 
value for max delta commits for metadata table. 

[https://github.com/apache/hudi/pull/3793]

 

Please set it back to 10. Lets re-run CI for the patch few times to ensure 
there are no flakiness.

 

  was:
our CI was broken recently. hence reverted couple of tests and the default 
value for max delta commits for metadata table. 

Please set it back to 10. Lets re-run CI for the patch few times to ensure 
there are no flakiness.

 


> Re-enable max delta commits for metadata table to 10
> 
>
> Key: HUDI-2553
> URL: https://issues.apache.org/jira/browse/HUDI-2553
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>
> our CI was broken recently. hence reverted couple of tests and the default 
> value for max delta commits for metadata table. 
> [https://github.com/apache/hudi/pull/3793]
>  
> Please set it back to 10. Lets re-run CI for the patch few times to ensure 
> there are no flakiness.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2555) Fix flaky FlinkCompaction integration test

2021-10-13 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2555:
-

 Summary: Fix flaky FlinkCompaction integration test
 Key: HUDI-2555
 URL: https://issues.apache.org/jira/browse/HUDI-2555
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


Recently CI was broken and had to revert some suspicious tests. 

We need to fix it and re-enable them back. 

[ITTestHoodieFlinkCompactor.java|https://github.com/apache/hudi/pull/3793/files#diff-f15b4ec18c40c9494e62ae73aa4b79beeafd1a5fa185b6ec6a7044fa6ed9e1fd]
 

testHoodieFlinkCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2555) Fix flaky FlinkCompaction integration test

2021-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2555:
-

Assignee: Danny Chen

> Fix flaky FlinkCompaction integration test
> --
>
> Key: HUDI-2555
> URL: https://issues.apache.org/jira/browse/HUDI-2555
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Danny Chen
>Priority: Major
>
> Recently CI was broken and had to revert some suspicious tests. 
> We need to fix it and re-enable them back. 
> [ITTestHoodieFlinkCompactor.java|https://github.com/apache/hudi/pull/3793/files#diff-f15b4ec18c40c9494e62ae73aa4b79beeafd1a5fa185b6ec6a7044fa6ed9e1fd]
>  
> testHoodieFlinkCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2553:
-

Assignee: Manoj Govindassamy

> Re-enable max delta commits for metadata table to 10
> 
>
> Key: HUDI-2553
> URL: https://issues.apache.org/jira/browse/HUDI-2553
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Major
>
> our CI was broken recently. hence reverted couple of tests and the default 
> value for max delta commits for metadata table. 
> Please set it back to 10. Lets re-run CI for the patch few times to ensure 
> there are no flakiness.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2554) Fix some flaky metadata tests

2021-10-13 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2554:
-

 Summary: Fix some flaky metadata tests
 Key: HUDI-2554
 URL: https://issues.apache.org/jira/browse/HUDI-2554
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


recently CI was broken and had to disable few tests. 

[https://github.com/apache/hudi/pull/3793/files]

TestHoodieBackedMetadata

testRollbackOperations

testErrorCases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2554) Fix some flaky metadata tests

2021-10-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2554:
-

Assignee: sivabalan narayanan

> Fix some flaky metadata tests
> -
>
> Key: HUDI-2554
> URL: https://issues.apache.org/jira/browse/HUDI-2554
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> recently CI was broken and had to disable few tests. 
> [https://github.com/apache/hudi/pull/3793/files]
> TestHoodieBackedMetadata
> testRollbackOperations
> testErrorCases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2553) Re-enable max delta commits for metadata table to 10

2021-10-13 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2553:
-

 Summary: Re-enable max delta commits for metadata table to 10
 Key: HUDI-2553
 URL: https://issues.apache.org/jira/browse/HUDI-2553
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


our CI was broken recently. hence reverted couple of tests and the default 
value for max delta commits for metadata table. 

Please set it back to 10. Lets re-run CI for the patch few times to ensure 
there are no flakiness.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (e6711b1 -> cff384d)

2021-10-13 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from e6711b1  [HUDI-2435][BUG]Fix clustering handle errors (#3666)
 add cff384d  [HUDI-2552] Fixing some test failures to unblock broken CI 
master (#3793)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/table/HoodieTable.java | 3 +--
 .../apache/hudi/client/functional/TestHoodieBackedMetadata.java  | 9 ++---
 .../java/org/apache/hudi/common/config/HoodieMetadataConfig.java | 2 +-
 .../org/apache/hudi/sink/compact/ITTestHoodieFlinkCompactor.java | 8 
 4 files changed, 12 insertions(+), 10 deletions(-)


[GitHub] [hudi] nsivabalan merged pull request #3793: [HUDI-2552] Fixing some test failures to unblock broken CI master

2021-10-13 Thread GitBox


nsivabalan merged pull request #3793:
URL: https://github.com/apache/hudi/pull/3793


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * 5b9557062c7872f1a49f2261037425dc9b2c0185 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2627)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2549) Exceptions when using second writer into Hudi table managed by DeltaStreamer

2021-10-13 Thread Dave Hagman (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428499#comment-17428499
 ] 

Dave Hagman commented on HUDI-2549:
---

While continuing to test, I found that the _*FileAlreadyExistsException*_ can 
occur on both the deltastreamer and secondary writers (spark datasource writers 
in my tests). On my latest run the spark datasource writer created a commit 
"ahead" of the deltastreamer. This resulted in the deltastreamer failing with 
the same error as before:
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists:s3://
This also caused a more insidious issue: The deltastreamer checkpoint state is 
now missing from recent commits and therefore it is unable to start. 

[~shivnarayan] [~vinoth] Can you confirm that you are able to reproduce this 
issue? I remember seeing that you have run this exact configuration without 
issue before. If that is the case then I am quite confused why it would not 
work for me on a brand new table. 

> Exceptions when using second writer into Hudi table managed by DeltaStreamer
> 
>
> Key: HUDI-2549
> URL: https://issues.apache.org/jira/browse/HUDI-2549
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Dave Hagman
>Assignee: Dave Hagman
>Priority: Critical
>  Labels: multi-writer, sev:critical
> Fix For: 0.10.0
>
>
> When running the DeltaStreamer along with a second spark datasource writer 
> (with [ZK-based OCC 
> enabled|https://hudi.apache.org/docs/concurrency_control#enabling-multi-writing]
>  we receive the following exception (which haults the spark datasource 
> writer). This occurs following warnings of timeline inconsistencies:
>  
> {code:java}
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ending with 
> transaction owner Option{val=[==>20211007170717__commit__INFLIGHT]}
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASING lock 
> atZkBasePath = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASED lock atZkBasePath 
> = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ended
> Exception in thread "main" java.lang.IllegalArgumentException
> at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:414)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:395)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:153)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:218)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:190)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:617)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:274)
> at 
> org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
>

[jira] [Reopened] (HUDI-270) [UMBRELLA] Improve Hudi website UI and documentation

2021-10-13 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-270:
-
  Assignee: Kyle Weller  (was: Bhavani Sudha Saktheeswaran)

> [UMBRELLA] Improve Hudi website UI and documentation
> 
>
> Key: HUDI-270
> URL: https://issues.apache.org/jira/browse/HUDI-270
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Reporter: Bhavani Sudha Saktheeswaran
>Assignee: Kyle Weller
>Priority: Minor
>  Labels: hudi-umbrellas
>
> This is an umbrella task of multiple tasks that aim to improve the website



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1958) [Umbrella] Follow up items from 1 pass over GH issues

2021-10-13 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1958:


Assignee: Kyle Weller  (was: Vinoth Chandar)

> [Umbrella] Follow up items from 1 pass over GH issues
> -
>
> Key: HUDI-1958
> URL: https://issues.apache.org/jira/browse/HUDI-1958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Nishith Agarwal
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: Docs, hudi-umbrellas, release-blocker
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * 314d2f3212816795351a9961382b84630ed1069a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2626)
 
   * 5b9557062c7872f1a49f2261037425dc9b2c0185 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2627)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fengjian428 commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-13 Thread GitBox


fengjian428 commented on issue #3755:
URL: https://github.com/apache/hudi/issues/3755#issuecomment-942709709


   @guanziyue Do you know what this error means?
   `21/10/14 04:36:22 ERROR RequestHandler: Got runtime exception servicing 
request 
partition=TH%2F2021-01&maxinstant=20211014043535&basepath=hdfs%3A%2F%2Ftl5%2Fprojects%2Fdata_vite%2Fmysql_ingestion%2Frti_vite%2Fshopee_item_v4_db__item_v4_tab_new6&lastinstantts=20211014043614&timelinehash=5d50a0189abbb1e122f7a838ac389bb21ae27ef6db6428821c908be8f566e032
   java.lang.IllegalArgumentException: Last known instant from client was 
20211014043614 but server has the following timeline 
[[20211014042315__deltacommit__COMPLETED], 
[20211014042356__deltacommit__COMPLETED], 
[20211014042430__deltacommit__COMPLETED], 
[20211014042509__deltacommit__COMPLETED], 
[20211014042534__deltacommit__COMPLETED], [20211014042558__commit__COMPLETED], 
[20211014042607__deltacommit__COMPLETED], 
[20211014042648__deltacommit__COMPLETED], 
[20211014042713__deltacommit__COMPLETED], 
[20211014042736__deltacommit__COMPLETED], 
[20211014042758__deltacommit__COMPLETED], [20211014042820__commit__COMPLETED], 
[20211014042824__deltacommit__COMPLETED], [20211014042905__clean__COMPLETED], 
[20211014042918__deltacommit__COMPLETED], [20211014042937__clean__COMPLETED], 
[20211014042948__deltacommit__COMPLETED], [20211014043012__clean__COMPLETED], 
[20211014043022__deltacommit__COMPLETED], [20211014043047__clean__COMPLETED], 
[20211014043056__deltacommit__COMPLETED], [20211014043115__clean
 __COMPLETED], [20211014043124__commit__COMPLETED], 
[20211014043127__deltacommit__COMPLETED], [20211014043145__clean__COMPLETED], 
[20211014043313__deltacommit__COMPLETED], [20211014043351__clean__COMPLETED], 
[20211014043419__deltacommit__COMPLETED], [20211014043443__clean__COMPLETED], 
[20211014043454__deltacommit__COMPLETED], [20211014043525__clean__COMPLETED], 
[20211014043535__deltacommit__COMPLETED], [20211014043605__clean__COMPLETED], 
[20211014043614__commit__COMPLETED]]
   at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
   at 
org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:510)
   at 
io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
   at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
   at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
   at 
io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
   at 
io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
   at 
io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
   at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
   at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
   at 
org.apache.hudi.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
   at 
org.apache.hudi.org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
   at 
org.apache.hudi.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
   at 
org.apache.hudi.org.eclipse.jetty.server.Server.handle(Server.java:502)
   at 
org.apache.hudi.org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
   at 
org.apache.hudi.org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
   at 
org.apache.hudi.org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
   at 
org.apache.hudi.org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
   at 
org.apache.hudi.org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
   at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
   at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
   at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
   at 
org.apache.hudi.org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
   

[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * 314d2f3212816795351a9961382b84630ed1069a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2626)
 
   * 5b9557062c7872f1a49f2261037425dc9b2c0185 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * c9019d52d97deeec182234c18f87625537bf602c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2624)
 
   * 314d2f3212816795351a9961382b84630ed1069a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2626)
 
   * 5b9557062c7872f1a49f2261037425dc9b2c0185 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942679228


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * c9019d52d97deeec182234c18f87625537bf602c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2624)
 
   * 314d2f3212816795351a9961382b84630ed1069a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2626)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * c9019d52d97deeec182234c18f87625537bf602c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2624)
 
   * 314d2f3212816795351a9961382b84630ed1069a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942340315


   
   ## CI report:
   
   * c9019d52d97deeec182234c18f87625537bf602c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2624)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346


   
   ## CI report:
   
   * 3b08956e6b53ba25be60a659ac6d28d147d9a77b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2625)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


yihua commented on a change in pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#discussion_r728307580



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##
@@ -18,39 +18,277 @@
 
 package org.apache.hudi.table.action.compact;
 
+import org.apache.hudi.avro.model.HoodieCompactionOperation;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
+import org.apache.hudi.client.AbstractHoodieWriteClient;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieAccumulator;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.engine.TaskContextSupplier;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.CompactionOperation;
+import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieFileGroupId;
+import org.apache.hudi.common.model.HoodieLogFile;
 import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.model.HoodieWriteStat.RuntimeStats;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
+import org.apache.hudi.common.table.view.TableFileSystemView.SliceView;
+import org.apache.hudi.common.util.CollectionUtils;
+import org.apache.hudi.common.util.CompactionUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.IOUtils;
+import org.apache.hudi.table.HoodieCopyOnWriteTableOperation;
 import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.compact.strategy.CompactionStrategy;
+
+import org.apache.avro.Schema;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
 
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
 import java.util.Set;
+import java.util.stream.StreamSupport;
+
+import static java.util.stream.Collectors.toList;
 
 /**
  * A HoodieCompactor runs compaction on a hoodie table.
  */
-public interface HoodieCompactor 
extends Serializable {
+public abstract class HoodieCompactor 
implements Serializable {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieCompactor.class);
 
   /**
-   * Generate a new compaction plan for scheduling.
+   * @param config Write config.
+   * @return the reader schema for {@link HoodieMergedLogRecordScanner}.
+   */
+  public abstract Schema getReaderSchema(HoodieWriteConfig config);
+
+  /**
+   * Updates the reader schema for actual compaction operations.
*
-   * @param context HoodieEngineContext
-   * @param hoodieTable Hoodie Table
-   * @param config Hoodie Write Configuration
-   * @param compactionCommitTime scheduled compaction commit time
-   * @param fgIdsInPendingCompactions partition-fileId pairs for which 
compaction is pending
-   * @return Compaction Plan
-   * @throws IOException when encountering errors
+   * @param config Write config.
+   * @param metaClient {@link HoodieTableMetaClient} instance to use.
*/
-  HoodieCompactionPlan generateCompactionPlan(HoodieEngineContext context, 
HoodieTable hoodieTable, HoodieWriteConfig config,
-  String compactionCommitTime, 
Set fgIdsInPendingCompactions) throws IOException;
+  public abstract void updateReaderSchema(HoodieWriteConfig config, 
HoodieTableMetaClient metaClient);
+
+  /**
+   * Handles the compaction timeline based on the compaction instant.
+   *
+   * @param table {@link HoodieTable} instance to use.
+   * @param pendingCompactionTimeline pending compaction timeline.
+   * @param compactionInstantTime compaction instant
+   * @param writeClient   Write client.
+   */
+  public abstract void handleCompactionTimeline(
+  HoodieTable table, HoodieTimeline pendingCompactionTimeline,

Review comment:
   Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


nsivabalan commented on a change in pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#discussion_r728244437



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java
##
@@ -126,23 +130,21 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
   }
 
   @Override
-  public Map getAllFilesInPartitions(List 
partitionPaths)
+  public Map getAllFilesInPartitions(List 
partitions)
   throws IOException {
 if (enabled) {
-  Map partitionsFilesMap = new HashMap<>();
-
   try {
-for (String partitionPath : partitionPaths) {
-  partitionsFilesMap.put(partitionPath, fetchAllFilesInPartition(new 
Path(partitionPath)));
-}
+// need to understand why we did not make bulk get before

Review comment:
   @prashantwason @satishkotha : do you guys know why we did not do batch 
get here and doing 1 key at a time? is there any particular reason for it. I 
have fixed it to fetch batch get in this patch.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-13 Thread GitBox


prashantwason commented on a change in pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#discussion_r728264572



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java
##
@@ -126,23 +130,21 @@ protected BaseTableMetadata(HoodieEngineContext 
engineContext, HoodieMetadataCon
   }
 
   @Override
-  public Map getAllFilesInPartitions(List 
partitionPaths)
+  public Map getAllFilesInPartitions(List 
partitions)
   throws IOException {
 if (enabled) {
-  Map partitionsFilesMap = new HashMap<>();
-
   try {
-for (String partitionPath : partitionPaths) {
-  partitionsFilesMap.put(partitionPath, fetchAllFilesInPartition(new 
Path(partitionPath)));
-}
+// need to understand why we did not make bulk get before

Review comment:
   For simplicity of implementation I suppose - performance was not taken 
into consideration. Also, given the number of keys being fetched, batch would 
be slower as it may need to read the entire hfile.
   
   @umehrot2  Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3741: [HUDI-2501] Add HoodieData abstraction and refactor compaction actions in hudi-client module

2021-10-13 Thread GitBox


hudi-bot edited a comment on pull request #3741:
URL: https://github.com/apache/hudi/pull/3741#issuecomment-931660346






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3793: [HUDI-2552] Fixing metadata validation causing test failures

2021-10-13 Thread GitBox


nsivabalan commented on pull request #3793:
URL: https://github.com/apache/hudi/pull/3793#issuecomment-942595144


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   >