[GitHub] [hudi] ssandona commented on issue #7032: [SUPPORT] When metatable enabled, some query using index column as filter will get empty result

2022-10-23 Thread GitBox


ssandona commented on issue #7032:
URL: https://github.com/apache/hudi/issues/7032#issuecomment-1288506888

   We are observing the same behavior with Hudi 0.11.1. In our case we are 
filtering by a string column containing a timestamp like "202001110858". We 
obtain different results if enabling or disabling "hoodie.enable.data.skipping".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


danny0405 commented on code in PR #6632:
URL: https://github.com/apache/hudi/pull/6632#discussion_r1002934556


##
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##
@@ -202,22 +199,19 @@ public R get(Object key) {
 
   @Override
   public R put(T key, R value) {
+if (this.currentInMemoryMapSize >= maxInMemorySizeInBytes || 
inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) {

Review Comment:
   What is the purpose for estimation pre-check: `this.currentInMemoryMapSize 
>= maxInMemorySizeInBytes` ?
   
   
   And why we have this evaluate expression:
   ```java
   this.estimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9 
   + (keySizeEstimator.sizeEstimate(key) + 
valueSizeEstimator.sizeEstimate(value)) * 0.1)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


danny0405 commented on code in PR #6632:
URL: https://github.com/apache/hudi/pull/6632#discussion_r1002934556


##
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##
@@ -202,22 +199,19 @@ public R get(Object key) {
 
   @Override
   public R put(T key, R value) {
+if (this.currentInMemoryMapSize >= maxInMemorySizeInBytes || 
inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) {

Review Comment:
   What is the purpose for estimation for check: `this.currentInMemoryMapSize 
>= maxInMemorySizeInBytes` ?
   
   
   And why we have this evaluate expression:
   ```java
   this.estimatedPayloadSize = (long) (this.estimatedPayloadSize * 0.9 
   + (keySizeEstimator.sizeEstimate(key) + 
valueSizeEstimator.sizeEstimate(value)) * 0.1)
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5083) A bug occurs when the schema changes multiple times to a once existed column

2022-10-23 Thread shenshengli (Jira)
shenshengli created HUDI-5083:
-

 Summary: A bug occurs when the schema changes multiple times to a 
once existed column
 Key: HUDI-5083
 URL: https://issues.apache.org/jira/browse/HUDI-5083
 Project: Apache Hudi
  Issue Type: Bug
Reporter: shenshengli






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


danny0405 commented on code in PR #6632:
URL: https://github.com/apache/hudi/pull/6632#discussion_r1002932637


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -521,11 +522,16 @@ private void writeToBuffer(HoodieRecord record) {
* Checks if the number of records have reached the set threshold and then 
flushes the records to disk.
*/
   private void flushToDiskIfRequired(HoodieRecord record) {
+if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize) 
+|| numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) {
+  averageRecordSize = (long) (averageRecordSize * 0.8 + 
sizeEstimator.sizeEstimate(record) * 0.2);
+}
+
 // Append if max number of records reached to achieve block size
 if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)) {
   // Recompute averageRecordSize before writing a new block and update 
existing value with
   // avg of new and old
-  LOG.info("AvgRecordSize => " + averageRecordSize);
+  LOG.info("Flush log block to disk, the current avgRecordSize => " + 
averageRecordSize);

Review Comment:
   What's the problem here if we only estimate the record size on flushing ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


Zouxxyy commented on PR #6999:
URL: https://github.com/apache/hudi/pull/6999#issuecomment-1288485908

   @Zouxxyy Fixed all comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #6976: [HUDI-5042]fix clustering schedule problem in flink

2022-10-23 Thread GitBox


danny0405 commented on code in PR #6976:
URL: https://github.com/apache/hudi/pull/6976#discussion_r1002916349


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java:
##
@@ -171,7 +171,7 @@ public static HoodieWriteConfig getHoodieClientConfig(
 
.withMergeAllowDuplicateOnInserts(OptionsResolver.insertClustering(conf))
 .withClusteringConfig(
 HoodieClusteringConfig.newBuilder()
-
.withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED))
+
.withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_ASYNC_ENABLED) || 
conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED))

Review Comment:
   ```java
   // mainly for clustering scheduling
   
withAsyncClustering(conf.getBoolean(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED))
   ```
   
   Config only as scheduling option should work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron commented on issue #6931: SparkSQL create hudi DDL do not support hoodie.datasource.write.operation = 'insert'

2022-10-23 Thread GitBox


YannByron commented on issue #6931:
URL: https://github.com/apache/hudi/issues/6931#issuecomment-1288475156

   @nsivabalan i think we can close this.
   this issue to spark-sql has been explained by @Zouxxyy and @boneanxs , and 
@Zouxxyy provides a pr https://github.com/apache/hudi/pull/6949 to describe how 
can work with `hoodie.datasource.write.operation` and 
`hoodie.merge.allow.duplicate.on.inserts`. If still have issue to flink-sql, 
better to create a new issue to follow up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5082) Improve the cdc log file name format

2022-10-23 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron reassigned HUDI-5082:


Assignee: Yann Byron

> Improve the cdc log file name format
> 
>
> Key: HUDI-5082
> URL: https://issues.apache.org/jira/browse/HUDI-5082
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5082) Improve the cdc log file name format

2022-10-23 Thread Yann Byron (Jira)
Yann Byron created HUDI-5082:


 Summary: Improve the cdc log file name format
 Key: HUDI-5082
 URL: https://issues.apache.org/jira/browse/HUDI-5082
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Yann Byron






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths

2022-10-23 Thread GitBox


hudi-bot commented on PR #7000:
URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288469554

   
   ## CI report:
   
   * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500)
 
   * f77806bcd4a38c2f4c1d44e970199d19bfc72737 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12511)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths

2022-10-23 Thread GitBox


hudi-bot commented on PR #7000:
URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288464282

   
   ## CI report:
   
   * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500)
 
   * f77806bcd4a38c2f4c1d44e970199d19bfc72737 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6976: [HUDI-5042]fix clustering schedule problem in flink

2022-10-23 Thread GitBox


hudi-bot commented on PR #6976:
URL: https://github.com/apache/hudi/pull/6976#issuecomment-1288464150

   
   ## CI report:
   
   * 89e792414d09daaa8a367ecc4011450cc21e7069 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12386)
 
   * c6954076cdb8818f7df54e297a9184c48c7217d0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12510)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6976: [HUDI-5042]fix clustering schedule problem in flink

2022-10-23 Thread GitBox


hudi-bot commented on PR #6976:
URL: https://github.com/apache/hudi/pull/6976#issuecomment-1288459087

   
   ## CI report:
   
   * 89e792414d09daaa8a367ecc4011450cc21e7069 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12386)
 
   * c6954076cdb8818f7df54e297a9184c48c7217d0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…

2022-10-23 Thread GitBox


hudi-bot commented on PR #7018:
URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288455437

   
   ## CI report:
   
   * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12501)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


hudi-bot commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288420287

   
   ## CI report:
   
   * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12503)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7042: [MINOR] Improve the cdc log file name format

2022-10-23 Thread GitBox


hudi-bot commented on PR #7042:
URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288417562

   
   ## CI report:
   
   * 6ad211f90e9d94467ca6888e11bc28903b79ad15 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12509)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46

2022-10-23 Thread GitBox


hudi-bot commented on PR #7003:
URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288417452

   
   ## CI report:
   
   * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497)
 
   * 01c496501a59412c66df656a6d8801f1d2c45d6b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12508)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


hudi-bot commented on PR #6632:
URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288417059

   
   ## CI report:
   
   * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN
   * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN
   * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490)
 
   * 8b7f94e6743c5f2decfeddaf164585a2a471a6c6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12507)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-23 Thread GitBox


hudi-bot commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1288416366

   
   ## CI report:
   
   * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN
   * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN
   * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN
   * 0fc24d3af4181d8fb68d803b97be78e5cd448787 UNKNOWN
   * 298f66d2842b1fa3ce9c487fd3d0f94eda4bd2b1 UNKNOWN
   * a81ffdf9c24a3f2f984161ca193af3b387b1e9a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12324)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12331)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12348)
 
   * e5c17f060235551dd9130e7bc7bbc33b294ebb18 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12506)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7042: [MINOR] Improve the cdc log file name format

2022-10-23 Thread GitBox


hudi-bot commented on PR #7042:
URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288414199

   
   ## CI report:
   
   * 6ad211f90e9d94467ca6888e11bc28903b79ad15 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46

2022-10-23 Thread GitBox


hudi-bot commented on PR #7003:
URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288414073

   
   ## CI report:
   
   * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497)
 
   * 01c496501a59412c66df656a6d8801f1d2c45d6b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.

2022-10-23 Thread GitBox


hudi-bot commented on PR #6946:
URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288413958

   
   ## CI report:
   
   * 86099181bd76a59cdd1b537eb724f6f51ed0c711 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12494)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


hudi-bot commented on PR #6632:
URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288413659

   
   ## CI report:
   
   * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN
   * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN
   * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490)
 
   * 8b7f94e6743c5f2decfeddaf164585a2a471a6c6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288413251

   
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

2022-10-23 Thread GitBox


hudi-bot commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1288413070

   
   ## CI report:
   
   * b838e1f406902c9bdfb5e84d53ef5a5effd0765b UNKNOWN
   * 6114ee2aa59f087e5ef0b1b53979eec143b33f5e UNKNOWN
   * 92760dbf5a047fe1f9941fa4b36c944eb3bec5c7 UNKNOWN
   * 0fc24d3af4181d8fb68d803b97be78e5cd448787 UNKNOWN
   * 298f66d2842b1fa3ce9c487fd3d0f94eda4bd2b1 UNKNOWN
   * a81ffdf9c24a3f2f984161ca193af3b387b1e9a1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12324)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12331)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12348)
 
   * e5c17f060235551dd9130e7bc7bbc33b294ebb18 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception

2022-10-23 Thread GitBox


hudi-bot commented on PR #7001:
URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288410932

   
   ## CI report:
   
   * 03cb91b295d74d1fa7daf73592e53df76c84bc85 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12485)
 
   * 67282ced98d0531a1096bcc418c0126836d0fb51 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12504)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths

2022-10-23 Thread GitBox


hudi-bot commented on PR #7000:
URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288410910

   
   ## CI report:
   
   * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.

2022-10-23 Thread GitBox


hudi-bot commented on PR #6946:
URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288410810

   
   ## CI report:
   
   * 86099181bd76a59cdd1b537eb724f6f51ed0c711 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


hudi-bot commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288410026

   
   ## CI report:
   
   * 7982061f9d492b4c4d51ca4589e5a30dbc76530a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9860)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on PR #5830:
URL: https://github.com/apache/hudi/pull/5830#issuecomment-1288409306

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002875151


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/SchemaEvolutionContext.java:
##
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.format;
+
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.TableSchemaResolver;
+import org.apache.hudi.common.util.InternalSchemaCache;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.internal.schema.InternalSchema;
+import org.apache.hudi.internal.schema.Types;
+import org.apache.hudi.internal.schema.action.InternalSchemaMerger;
+import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter;
+import org.apache.hudi.table.format.mor.MergeOnReadInputSplit;
+import org.apache.hudi.util.AvroSchemaConverter;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.table.types.logical.LogicalType;
+
+import java.io.Serializable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * This class is responsible for calculating names and types of fields that 
are actual at a certain point in time.
+ * If field is renamed in queried schema, its old name will be returned, which 
is relevant at the provided time.
+ * If type of field is changed, its old type will be returned, and projection 
will be created that will convert the old type to the queried one.
+ */
+public final class SchemaEvolutionContext implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  private final HoodieTableMetaClient metaClient;
+  private final InternalSchema querySchema;
+
+  public static Option of(Configuration conf) {
+if (conf.getBoolean(FlinkOptions.SCHEMA_EVOLUTION_ENABLED)) {
+  HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+  return new TableSchemaResolver(metaClient)
+  .getTableInternalSchemaFromCommitMetadata()
+  .map(schema -> new SchemaEvolutionContext(metaClient, schema));
+} else {
+  return Option.empty();
+}
+  }
+
+  public SchemaEvolutionContext(HoodieTableMetaClient metaClient, 
InternalSchema querySchema) {
+this.metaClient = metaClient;
+this.querySchema = querySchema;
+  }
+
+  public InternalSchema getQuerySchema() {
+return querySchema;
+  }
+
+  public InternalSchema getActualSchema(FileInputSplit fileSplit) {
+return 
getActualSchema(FSUtils.getCommitTime(fileSplit.getPath().getName()));
+  }
+
+  public InternalSchema getActualSchema(MergeOnReadInputSplit split) {
+String commitTime = split.getBasePath()
+.map(FSUtils::getCommitTime)
+.orElse(split.getLatestCommit());
+return getActualSchema(commitTime);
+  }
+
+  public List getFieldNames(InternalSchema internalSchema) {
+return 
internalSchema.columns().stream().map(Types.Field::name).collect(Collectors.toList());
+  }
+
+  public List getFieldTypes(InternalSchema internalSchema) {
+return AvroSchemaConverter.convertToDataType(

Review Comment:
   Fixed



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/CastMap.java:
##
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, e

[jira] [Updated] (HUDI-3303) CI tests Improvements

2022-10-23 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3303:
-
Epic Name: CI tests improvements  (was: CI Improvements)

> CI tests Improvements
> -
>
> Key: HUDI-3303
> URL: https://issues.apache.org/jira/browse/HUDI-3303
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
>
> Automate tests that need to be manually performed before releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5081) Resources clean-up in hudi-utilities tests

2022-10-23 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-5081.

Resolution: Fixed

> Resources clean-up in hudi-utilities tests
> --
>
> Key: HUDI-5081
> URL: https://issues.apache.org/jira/browse/HUDI-5081
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated (0bc3eb8aab -> fa04e814cd)

2022-10-23 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 0bc3eb8aab [HUDI-4971] Remove direct use of kryo from `SerDeUtils` 
(#7014)
 add fa04e814cd [HUDI-5081] Tests clean up in hudi-utilities (#7033)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/utilities/TestHoodieIndexer.java   | 205 -
 .../deser/TestKafkaAvroSchemaDeserializer.java |   3 +-
 .../utilities/sources/TestGcsEventsSource.java |  13 --
 3 files changed, 79 insertions(+), 142 deletions(-)



[GitHub] [hudi] xushiyan merged pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities

2022-10-23 Thread GitBox


xushiyan merged PR #7033:
URL: https://github.com/apache/hudi/pull/7033


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities

2022-10-23 Thread GitBox


xushiyan commented on PR #7033:
URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288403021

   CI failure due to unrelated flakiness. this change is only for utilities 
tests, which have passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5081) Resources clean-up in hudi-utilities tests

2022-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5081:
-
Labels: pull-request-available  (was: )

> Resources clean-up in hudi-utilities tests
> --
>
> Key: HUDI-5081
> URL: https://issues.apache.org/jira/browse/HUDI-5081
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on a diff in pull request #7033: [HUDI-5081] Tests clean up in hudi-utilities

2022-10-23 Thread GitBox


xushiyan commented on code in PR #7033:
URL: https://github.com/apache/hudi/pull/7033#discussion_r1002872498


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java:
##
@@ -75,46 +68,29 @@
 import static org.junit.jupiter.api.Assertions.assertFalse;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class TestHoodieIndexer extends HoodieCommonTestHarness implements 
SparkProvider {
+public class TestHoodieIndexer extends SparkClientFunctionalTestHarness 
implements SparkProvider {

Review Comment:
   /nit SparkProvider already implemented by SparkClientFunctionalTestHarness



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5081) Resources clean-up in hudi-utilities tests

2022-10-23 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5081:
-
Fix Version/s: 0.12.2

> Resources clean-up in hudi-utilities tests
> --
>
> Key: HUDI-5081
> URL: https://issues.apache.org/jira/browse/HUDI-5081
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Timothy Brown
>Priority: Major
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5081) Resources clean-up in hudi-utilities tests

2022-10-23 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-5081:


 Summary: Resources clean-up in hudi-utilities tests
 Key: HUDI-5081
 URL: https://issues.apache.org/jira/browse/HUDI-5081
 Project: Apache Hudi
  Issue Type: Task
  Components: tests-ci
Reporter: Raymond Xu
Assignee: Timothy Brown






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-3303) CI tests Improvements

2022-10-23 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3303:
-
Summary: CI tests Improvements  (was: CI Improvements)

> CI tests Improvements
> -
>
> Key: HUDI-3303
> URL: https://issues.apache.org/jira/browse/HUDI-3303
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
>
> Automate tests that need to be manually performed before releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


YannByron commented on code in PR #6999:
URL: https://github.com/apache/hudi/pull/6999#discussion_r1002869098


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/RepairHoodieTableCommand.scala:
##
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi.command
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path, PathFilter}
+import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
+
+import org.apache.hudi.common.table.HoodieTableConfig
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.execution.command.PartitionStatistics
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.util.{SerializableConfiguration, ThreadUtils}
+
+import java.util.concurrent.TimeUnit.MILLISECONDS
+
+import scala.util.control.NonFatal
+
+/**
+ * Command for repair hudi table's partitions.
+ * Use hoodieCatalogTable.getPartitionPaths() to get partitions instead of 
scanning the file system.
+ */
+case class RepairHoodieTableCommand(tableName: TableIdentifier,
+enableAddPartitions: Boolean,
+enableDropPartitions: Boolean,
+cmd: String = "MSCK REPAIR TABLE") extends 
HoodieLeafRunnableCommand {
+
+  // These are list of statistics that can be collected quickly without 
requiring a scan of the data
+  // see https://github.com/apache/hive/blob/master/
+  //   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java
+  val NUM_FILES = "numFiles"
+  val TOTAL_SIZE = "totalSize"
+  val DDL_TIME = "transient_lastDdlTime"
+
+  private def getPathFilter(hadoopConf: Configuration): PathFilter = {
+// Dummy jobconf to get to the pathFilter defined in configuration
+// It's very expensive to create a JobConf(ClassUtil.findContainingJar() 
is slow)
+val jobConf = new JobConf(hadoopConf, this.getClass)
+val pathFilter = FileInputFormat.getInputPathFilter(jobConf)
+new PathFilter {
+  override def accept(path: Path): Boolean = {
+val name = path.getName
+if (name != "_SUCCESS" && name != "_temporary" && 
!name.startsWith(".")) {
+  pathFilter == null || pathFilter.accept(path)
+} else {
+  false
+}
+  }
+}
+  }
+
+  override def run(spark: SparkSession): Seq[Row] = {
+val catalog = spark.sessionState.catalog
+val table = catalog.getTableMetadata(tableName)
+val tableIdentWithDB = table.identifier.quotedString
+if (table.partitionColumnNames.isEmpty) {
+  throw new AnalysisException(
+s"Operation not allowed: $cmd only works on partitioned tables: 
$tableIdentWithDB")
+}
+
+if (table.storage.locationUri.isEmpty) {
+  throw new AnalysisException(s"Operation not allowed: $cmd only works on 
table with " +
+s"location provided: $tableIdentWithDB")
+}
+
+val root = new Path(table.location)
+logInfo(s"Recover all the partitions in $root")
+
+val hoodieCatalogTable = HoodieCatalogTable(spark, table.identifier)
+val isHiveStyledPartitioning = hoodieCatalogTable.catalogProperties.
+  getOrElse(HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE.key, 
"true").equals("true")
+val partitionSpecsAndLocs: Seq[(TablePartitionSpec, Path)] = 
hoodieCatalogTable.getPartitionPaths.map(partitionPath => {
+  var values = partitionPath.split('/')
+  if (isHiveStyledPartitioning) {
+values = values.map(_.split('=')(1))
+  }
+  (table.partitionColumnNames.zip(values).toMap, new Path(root, 
partitionPath))
+})
+
+val droppedAmount = if (enableDropPartitions) {
+  dropPartitions(catalog, partitionSpecsAndLocs)
+} else 0
+val addedAmount = if (enableAddPartitions) {
+  val hadoopConf = spark.sessionState.newHadoopConf()
+  val fs = root.getFileSystem(hadoopConf)
+  val pathFilter = getPathFilter(hadoopConf)
+  val threshold = 
spark.sparkContext.conf

[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


YannByron commented on code in PR #6999:
URL: https://github.com/apache/hudi/pull/6999#discussion_r1002867035


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/RepairHoodieTableCommand.scala:
##
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi.command
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileSystem, Path, PathFilter}
+import org.apache.hadoop.mapred.{FileInputFormat, JobConf}
+
+import org.apache.hudi.common.table.HoodieTableConfig
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.execution.command.PartitionStatistics
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.util.{SerializableConfiguration, ThreadUtils}
+
+import java.util.concurrent.TimeUnit.MILLISECONDS
+
+import scala.util.control.NonFatal
+
+/**
+ * Command for repair hudi table's partitions.
+ * Use hoodieCatalogTable.getPartitionPaths() to get partitions instead of 
scanning the file system.
+ */
+case class RepairHoodieTableCommand(tableName: TableIdentifier,
+enableAddPartitions: Boolean,
+enableDropPartitions: Boolean,
+cmd: String = "MSCK REPAIR TABLE") extends 
HoodieLeafRunnableCommand {
+
+  // These are list of statistics that can be collected quickly without 
requiring a scan of the data
+  // see https://github.com/apache/hive/blob/master/
+  //   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java
+  val NUM_FILES = "numFiles"
+  val TOTAL_SIZE = "totalSize"
+  val DDL_TIME = "transient_lastDdlTime"
+
+  private def getPathFilter(hadoopConf: Configuration): PathFilter = {
+// Dummy jobconf to get to the pathFilter defined in configuration
+// It's very expensive to create a JobConf(ClassUtil.findContainingJar() 
is slow)
+val jobConf = new JobConf(hadoopConf, this.getClass)
+val pathFilter = FileInputFormat.getInputPathFilter(jobConf)
+new PathFilter {
+  override def accept(path: Path): Boolean = {
+val name = path.getName
+if (name != "_SUCCESS" && name != "_temporary" && 
!name.startsWith(".")) {
+  pathFilter == null || pathFilter.accept(path)
+} else {
+  false
+}
+  }
+}
+  }
+
+  override def run(spark: SparkSession): Seq[Row] = {
+val catalog = spark.sessionState.catalog
+val table = catalog.getTableMetadata(tableName)
+val tableIdentWithDB = table.identifier.quotedString
+if (table.partitionColumnNames.isEmpty) {
+  throw new AnalysisException(
+s"Operation not allowed: $cmd only works on partitioned tables: 
$tableIdentWithDB")
+}
+
+if (table.storage.locationUri.isEmpty) {
+  throw new AnalysisException(s"Operation not allowed: $cmd only works on 
table with " +
+s"location provided: $tableIdentWithDB")
+}
+
+val root = new Path(table.location)
+logInfo(s"Recover all the partitions in $root")
+
+val hoodieCatalogTable = HoodieCatalogTable(spark, table.identifier)
+val isHiveStyledPartitioning = hoodieCatalogTable.catalogProperties.
+  getOrElse(HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE.key, 
"true").equals("true")

Review Comment:
   `toBoolean` can work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #7042: [MINOR] Improve the cdc log file name format

2022-10-23 Thread GitBox


xushiyan commented on PR #7042:
URL: https://github.com/apache/hudi/pull/7042#issuecomment-1288385370

   @YannByron can you file a JIRA please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002865594


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java:
##
@@ -99,10 +113,36 @@ public CopyOnWriteInputFormat(
 this.selectedFields = selectedFields;
 this.conf = new SerializableConfiguration(conf);
 this.utcTimestamp = utcTimestamp;
+this.schemaEvolutionContext = SchemaEvolutionContext.of(flinkConf);
   }
 
   @Override
   public void open(FileInputSplit fileSplit) throws IOException {
+String[] actualFieldNames;
+DataType[] actualFieldTypes;
+if (schemaEvolutionContext.isPresent()) {
+  SchemaEvolutionContext context = schemaEvolutionContext.get();
+  InternalSchema actualSchema = context.getActualSchema(fileSplit);

Review Comment:
   Moved this logic to separate method `setActualFields`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864979


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##
@@ -135,6 +139,12 @@
*/
   private boolean closed = true;
 
+  private final Option schemaEvolutionContext;
+  private List actualFieldNames;
+  private List actualFieldTypes;
+  private InternalSchema actualSchema;
+  private InternalSchema querySchema;

Review Comment:
   Fields `actualFieldNames`, `actualFieldTypes`, `actualSchema`, `querySchema` 
are removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


YannByron commented on code in PR #6999:
URL: https://github.com/apache/hudi/pull/6999#discussion_r1002864497


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestRepairTable.scala:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, 
PRECOMBINE_FIELD, RECORDKEY_FIELD}
+import org.apache.hudi.HoodieSparkUtils
+import 
org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE
+import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME
+
+import org.apache.spark.sql.SaveMode
+
+class TestRepairTable extends HoodieSparkSqlTestBase {
+
+  test("Test msck repair non-partitioned table") {
+Seq("true", "false").foreach { hiveStylePartitionEnable =>
+  withTempDir { tmp =>
+val tableName = generateTableName
+val basePath = s"${tmp.getCanonicalPath}/$tableName"
+spark.sql(
+  s"""
+ | create table $tableName (
+ |  id int,
+ |  name string,
+ |  ts long,
+ |  dt string,
+ |  hh string
+ | ) using hudi
+ | location '$basePath'
+ | tblproperties (
+ |  primaryKey = 'id',
+ |  preCombineField = 'ts',
+ |  hoodie.datasource.write.hive_style_partitioning = 
'$hiveStylePartitionEnable'
+ | )
+""".stripMargin)
+
+checkExceptionContain(s"msck repair table $tableName")(
+  s"Operation not allowed")
+  }
+}
+  }
+
+  test("Test msck repair partitioned table") {
+Seq("true", "false").foreach { hiveStylePartitionEnable =>
+  withTempDir { tmp =>
+val tableName = generateTableName
+val basePath = s"${tmp.getCanonicalPath}/$tableName"
+spark.sql(
+  s"""
+ | create table $tableName (
+ |  id int,
+ |  name string,
+ |  ts long,
+ |  dt string,
+ |  hh string
+ | ) using hudi
+ | partitioned by (dt, hh)
+ | location '$basePath'
+ | tblproperties (
+ |  primaryKey = 'id',
+ |  preCombineField = 'ts',
+ |  hoodie.datasource.write.hive_style_partitioning = 
'$hiveStylePartitionEnable'
+ | )
+""".stripMargin)
+val table = 
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
+
+import spark.implicits._
+val df = Seq((1, "a1", 1000, "2022-10-06", "11"), (2, "a2", 1001, 
"2022-10-06", "12"))
+  .toDF("id", "name", "ts", "dt", "hh")
+df.write.format("hudi")
+  .option(RECORDKEY_FIELD.key, "id")
+  .option(PRECOMBINE_FIELD.key, "ts")
+  .option(PARTITIONPATH_FIELD.key, "dt, hh")
+  .option(HIVE_STYLE_PARTITIONING_ENABLE.key, hiveStylePartitionEnable)
+  .mode(SaveMode.Append)
+  .save(basePath)
+
+
assertResult(Seq())(spark.sessionState.catalog.listPartitionNames(table))
+spark.sql(s"msck repair table $tableName")
+spark.sql(s"msck repair table $tableName")

Review Comment:
   why execute this sql twice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864355


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/FormatUtils.java:
##
@@ -130,6 +132,7 @@ public static HoodieMergedLogRecordScanner logScanner(
 .withBasePath(split.getTablePath())
 .withLogFilePaths(split.getLogPaths().get())
 .withReaderSchema(logSchema)
+.withInternalSchema(internalSchema)
 .withLatestInstantTime(split.getLatestCommit())

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] trushev commented on a diff in pull request #5830: [HUDI-3981][RFC-33] Flink engine support for comprehensive schema evolution

2022-10-23 Thread GitBox


trushev commented on code in PR #5830:
URL: https://github.com/apache/hudi/pull/5830#discussion_r1002864150


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java:
##
@@ -135,10 +137,15 @@ public Builder 
withLogRecordScannerCallback(LogRecordScannerCallback callback) {
   return this;
 }
 
+public Builder withInternalSchema(InternalSchema internalSchema) {
+  this.internalSchema = internalSchema;
+  return this;

Review Comment:
   I reverted changes in `HoodieMergedLogRecordScanner`. Now there is only one 
schema -- `InternalSchema` which wraps `org.apache.avro.Schema`. The same 
approach is used in `HoodieUnMergedLogRecordScanner`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


YannByron commented on code in PR #6999:
URL: https://github.com/apache/hudi/pull/6999#discussion_r1002864134


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestRepairTable.scala:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.hudi.DataSourceWriteOptions.{PARTITIONPATH_FIELD, 
PRECOMBINE_FIELD, RECORDKEY_FIELD}
+import org.apache.hudi.HoodieSparkUtils
+import 
org.apache.hudi.common.table.HoodieTableConfig.HIVE_STYLE_PARTITIONING_ENABLE
+import org.apache.hudi.config.HoodieWriteConfig.TBL_NAME
+
+import org.apache.spark.sql.SaveMode
+
+class TestRepairTable extends HoodieSparkSqlTestBase {
+
+  test("Test msck repair non-partitioned table") {
+Seq("true", "false").foreach { hiveStylePartitionEnable =>
+  withTempDir { tmp =>
+val tableName = generateTableName
+val basePath = s"${tmp.getCanonicalPath}/$tableName"
+spark.sql(
+  s"""
+ | create table $tableName (
+ |  id int,
+ |  name string,
+ |  ts long,
+ |  dt string,
+ |  hh string
+ | ) using hudi
+ | location '$basePath'
+ | tblproperties (
+ |  primaryKey = 'id',
+ |  preCombineField = 'ts',
+ |  hoodie.datasource.write.hive_style_partitioning = 
'$hiveStylePartitionEnable'
+ | )
+""".stripMargin)
+
+checkExceptionContain(s"msck repair table $tableName")(
+  s"Operation not allowed")
+  }
+}
+  }
+
+  test("Test msck repair partitioned table") {
+Seq("true", "false").foreach { hiveStylePartitionEnable =>
+  withTempDir { tmp =>
+val tableName = generateTableName
+val basePath = s"${tmp.getCanonicalPath}/$tableName"
+spark.sql(
+  s"""
+ | create table $tableName (
+ |  id int,
+ |  name string,
+ |  ts long,
+ |  dt string,
+ |  hh string
+ | ) using hudi
+ | partitioned by (dt, hh)
+ | location '$basePath'
+ | tblproperties (
+ |  primaryKey = 'id',
+ |  preCombineField = 'ts',
+ |  hoodie.datasource.write.hive_style_partitioning = 
'$hiveStylePartitionEnable'
+ | )
+""".stripMargin)
+val table = 
spark.sessionState.sqlParser.parseTableIdentifier(tableName)
+
+import spark.implicits._
+val df = Seq((1, "a1", 1000, "2022-10-06", "11"), (2, "a2", 1001, 
"2022-10-06", "12"))
+  .toDF("id", "name", "ts", "dt", "hh")
+df.write.format("hudi")
+  .option(RECORDKEY_FIELD.key, "id")
+  .option(PRECOMBINE_FIELD.key, "ts")
+  .option(PARTITIONPATH_FIELD.key, "dt, hh")
+  .option(HIVE_STYLE_PARTITIONING_ENABLE.key, hiveStylePartitionEnable)
+  .mode(SaveMode.Append)
+  .save(basePath)
+
+
assertResult(Seq())(spark.sessionState.catalog.listPartitionNames(table))
+spark.sql(s"msck repair table $tableName")
+spark.sql(s"msck repair table $tableName")
+assertResult(Seq("dt=2022-10-06/hh=11", 
"dt=2022-10-06/hh=12"))(spark.sessionState.catalog.listPartitionNames(table))

Review Comment:
   nit:  code format.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron commented on a diff in pull request #6999: [HUDI-5057] Fix msck repair hudi table

2022-10-23 Thread GitBox


YannByron commented on code in PR #6999:
URL: https://github.com/apache/hudi/pull/6999#discussion_r1002863123


##
hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/HoodieSpark2CatalystPlanUtils.scala:
##
@@ -74,4 +74,15 @@ object HoodieSpark2CatalystPlanUtils extends 
HoodieCatalystPlansUtils {
   override def getRelationTimeTravel(plan: LogicalPlan): Option[(LogicalPlan, 
Option[Expression], Option[String])] = {
 throw new IllegalStateException(s"Should not call getRelationTimeTravel 
for spark2")
   }
+
+  override def isRepairTable(plan: LogicalPlan): Boolean = {
+plan.isInstanceOf[AlterTableRecoverPartitionsCommand]
+  }
+
+  override def getRepairTableChildren(plan: LogicalPlan): 
Option[(TableIdentifier, Boolean, Boolean, String)] = {
+plan match {
+  case c: AlterTableRecoverPartitionsCommand =>
+Some((c.tableName, true, false, c.cmd))

Review Comment:
   please explain why use `true` as the 2nd param default value, and `false` as 
the 3rd one in Code.



##
hudi-spark-datasource/hudi-spark3.1.x/src/main/scala/org/apache/spark/sql/HoodieSpark31CatalystPlanUtils.scala:
##
@@ -31,4 +33,15 @@ object HoodieSpark31CatalystPlanUtils extends 
HoodieSpark3CatalystPlanUtils {
   }
 
   override def projectOverSchema(schema: StructType, output: AttributeSet): 
ProjectionOverSchema = ProjectionOverSchema(schema)
+
+  override def isRepairTable(plan: LogicalPlan): Boolean = {
+plan.isInstanceOf[AlterTableRecoverPartitionsCommand]
+  }
+
+  override def getRepairTableChildren(plan: LogicalPlan): 
Option[(TableIdentifier, Boolean, Boolean, String)] = {
+plan match {
+  case c: AlterTableRecoverPartitionsCommand =>
+Some((c.tableName, true, false, c.cmd))

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YannByron opened a new pull request, #7042: [MINOR] optimize the cdc log file name

2022-10-23 Thread GitBox


YannByron opened a new pull request, #7042:
URL: https://github.com/apache/hudi/pull/7042

   ### Change Logs
   
   optimize the cdc log file name
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception

2022-10-23 Thread GitBox


hudi-bot commented on PR #7001:
URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288371602

   
   ## CI report:
   
   * 03cb91b295d74d1fa7daf73592e53df76c84bc85 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12485)
 
   * 67282ced98d0531a1096bcc418c0126836d0fb51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] liufangqi commented on pull request #7001: [HUDI-5061] bulk insert operation don't throw other exception except IOE Exception

2022-10-23 Thread GitBox


liufangqi commented on PR #7001:
URL: https://github.com/apache/hudi/pull/7001#issuecomment-1288368502

   > rebased w/ latest master.
   
   @nsivabalan THX for your remind, I have done the rebase work & squash work. 
Please check it again when you are free.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


hudi-bot commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288368335

   
   ## CI report:
   
   * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12503)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] MihawkZoro commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


MihawkZoro commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288368315

   @xiarixiaoyao Thank you. When will the repaired official spark bundle jar be 
released?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


xiarixiaoyao commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288366080

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


xiarixiaoyao commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288365585

   @MihawkZoro  schema evolution for hive and presto(mor table) can be found 
https://github.com/apache/hudi/pull/6989


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…

2022-10-23 Thread GitBox


hudi-bot commented on PR #7018:
URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288365113

   
   ## CI report:
   
   * 622fb9f5639ace8e15db0a778dfcb03b4c059ca8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12422)
 
   * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12501)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky

2022-10-23 Thread GitBox


hudi-bot commented on PR #7020:
URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288365144

   
   ## CI report:
   
   * 04d5ca0a11f5f8568f2d389dc4ec60468c04b596 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12488)
 
   * 23754c9d84c66721016d846ca6a20614626baa35 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12502)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky

2022-10-23 Thread GitBox


hudi-bot commented on PR #7020:
URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288361616

   
   ## CI report:
   
   * 04d5ca0a11f5f8568f2d389dc4ec60468c04b596 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12488)
 
   * 23754c9d84c66721016d846ca6a20614626baa35 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7018: [HUDI-5067] Merge the columns stats of multiple log blocks from the s…

2022-10-23 Thread GitBox


hudi-bot commented on PR #7018:
URL: https://github.com/apache/hudi/pull/7018#issuecomment-1288361577

   
   ## CI report:
   
   * 622fb9f5639ace8e15db0a778dfcb03b4c059ca8 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12422)
 
   * 2fd1d5ab5b34cdf0b5f9042e38efccd6b8091a60 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


xiarixiaoyao commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288359886

   already fix local, let me  raise a pr
 spark.sql("set hoodie.schema.on.read.enable=true")
 spark.sql("""create table ddl_test_t2 (
 |  col1 string,
 |  col2 string,
 |  col3 string,
 |  ts bigint
 |) using hudi
 |tblproperties (
 |  type = 'mor',
 |  primaryKey = 'col1',
 |  preCombineField = 'ts'
 |)""".stripMargin)
   
 spark.sql("insert into ddl_test_t2 
values('1','col2','col3',1),('2','col2','col3',2),('3','col2','col3',3)")
 spark.sql("""ALTER TABLE ddl_test_t2 DROP COLUMN col3""")
 spark.sql("ALTER TABLE ddl_test_t2 RENAME COLUMN col2 to col3")
 spark.sql("insert into ddl_test_t2 values('4','col2',4)")
 spark.sql("select col3 from ddl_test_t2").show(false)
   
   ++
   |col3|
   ++
   |col2|
   |col2|
   |col2|
   |col2|
   ++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] MihawkZoro commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


MihawkZoro commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288357272

   @xiarixiaoyao When will this bug be fixed, we are using this feature, it is 
urgent


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


xiarixiaoyao commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288348142

   rewriteRecordWithNewSchema  deal with rename failed,it should deal with 
rename first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] eric9204 commented on issue #6965: [SUPPORT]Data can be found in the latest partition of hudi table, but not in the historical partition.

2022-10-23 Thread GitBox


eric9204 commented on issue #6965:
URL: https://github.com/apache/hudi/issues/6965#issuecomment-1288338928

   > @eric9204 i try to reproduce this prolem, but failed, could you pls 
provide some dummy data
   
   @xiarixiaoyao sorry, I don't know what you mean, do you need hoodie parquet 
file on hdfs?  or like this:
   
![image](https://user-images.githubusercontent.com/90449228/197438688-3f7e87b0-55fc-4571-a117-a6c048f2cfc5.png)
   
![image](https://user-images.githubusercontent.com/90449228/197438831-c8db835d-f88a-4712-be1d-66119779cdc7.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on pull request #7020: [HUDI-5069] TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky

2022-10-23 Thread GitBox


xicm commented on PR #7020:
URL: https://github.com/apache/hudi/pull/7020#issuecomment-1288335284

   This test fails on my laptop, succeeds on Azure. I guess my laptop is too 
slow, the elapsed time reaches INLINE_COMPACT_TIME_DELTA_SECONDS quickly, so 
the compaction is executed earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #7040: [SUPPORT] spark-sql schema_evolution

2022-10-23 Thread GitBox


xiarixiaoyao commented on issue #7040:
URL: https://github.com/apache/hudi/issues/7040#issuecomment-1288334741

   @MihawkZoro   
   Thank you for your test,
This is really a bug, the final write ‘insert into ddl_test_t2 
values('4','col2',4);’ trigger is bug,Fix this bug as soon as possible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…

2022-10-23 Thread GitBox


KnightChess commented on code in PR #6824:
URL: https://github.com/apache/hudi/pull/6824#discussion_r1002828098


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: 
MergeIntoTable) extends Hoodie
 
   // column order changed after left anti join , we should keep column 
order of source dataframe
   val cols = removeMetaFields(sourceDF).columns
-  executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
parameters)
+  executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
writeParam)

Review Comment:
   @YannByron sorry, will add these days



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…

2022-10-23 Thread GitBox


KnightChess commented on code in PR #6824:
URL: https://github.com/apache/hudi/pull/6824#discussion_r1002827833


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: 
MergeIntoTable) extends Hoodie
 
   // column order changed after left anti join , we should keep column 
order of source dataframe
   val cols = removeMetaFields(sourceDF).columns
-  executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
parameters)
+  executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
writeParam)

Review Comment:
   @xushiyan I think `executeInsertOnly` and `executeUpsert` is different from 
hudi op `insert` and `upsert`, just a condition branch for `merge into` sql. 
And for the SQL Semantic, I think `merge into` shoudl only be used to `upsert` 
op, and event shoudle not follow the hudi `precombineKey`, because  `merget 
into` sql has a lot of flexibility to update the record which we want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


hudi-bot commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288312080

   
   ## CI report:
   
   * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths

2022-10-23 Thread GitBox


hudi-bot commented on PR #7000:
URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288312127

   
   ## CI report:
   
   * c9fe9314c82cc42ac497f47ccc8a53ae266beb55 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12462)
 
   * f0c09d506905d6e80f109b900e6e04bacffec4e6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12500)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-23 Thread GitBox


hudi-bot commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288311641

   
   ## CI report:
   
   * a09114e1c326791e33e910b2f660aaa6882dcfc9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12493)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on issue #6965: [SUPPORT]Data can be found in the latest partition of hudi table, but not in the historical partition.

2022-10-23 Thread GitBox


xiarixiaoyao commented on issue #6965:
URL: https://github.com/apache/hudi/issues/6965#issuecomment-1288309523

   @eric9204  i try to reproduce this prolem, but failed,  could you pls 
provide some dummy data


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7000: [HUDI-5060] Make all clean policies support incremental mode to find partition paths

2022-10-23 Thread GitBox


hudi-bot commented on PR #7000:
URL: https://github.com/apache/hudi/pull/7000#issuecomment-1288307439

   
   ## CI report:
   
   * c9fe9314c82cc42ac497f47ccc8a53ae266beb55 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12462)
 
   * f0c09d506905d6e80f109b900e6e04bacffec4e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


hudi-bot commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288307376

   
   ## CI report:
   
   * 11d8108e89bc1de462978acbaee3905f9cb9edba Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12447)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-23 Thread GitBox


hudi-bot commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288306886

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * a09114e1c326791e33e910b2f660aaa6882dcfc9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #6989: [HUDI-5000] Support schema evolution for Hive/presto

2022-10-23 Thread GitBox


xiarixiaoyao commented on PR #6989:
URL: https://github.com/apache/hudi/pull/6989#issuecomment-1288306078

   @hudi-bot run azur


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] YuweiXiao commented on pull request #6680: [HUDI-4812] lazy fetching partition path & file slice for HoodieFileIndex

2022-10-23 Thread GitBox


YuweiXiao commented on PR #6680:
URL: https://github.com/apache/hudi/pull/6680#issuecomment-1288289575

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup

2022-10-23 Thread GitBox


hudi-bot commented on PR #7033:
URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288232199

   
   ## CI report:
   
   * c4dcf26eba06562edb428e668fccbc94ed48f07b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12499)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-10-23 Thread GitBox


hudi-bot commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288228417

   
   ## CI report:
   
   * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12498)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46

2022-10-23 Thread GitBox


hudi-bot commented on PR #7003:
URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288211985

   
   ## CI report:
   
   * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6952: [HUDI-5035] Remove usage of deprecated HoodieTimer constructor

2022-10-23 Thread GitBox


hudi-bot commented on PR #6952:
URL: https://github.com/apache/hudi/pull/6952#issuecomment-1288211938

   
   ## CI report:
   
   * 41b6c99a662d2361e5f351079dc06c61f507f791 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12495)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-5056) Add support to DELETE_PARTITIONS w/ wild card

2022-10-23 Thread Hussein Awala (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hussein Awala reassigned HUDI-5056:
---

Assignee: Hussein Awala

> Add support to DELETE_PARTITIONS w/ wild card
> -
>
> Key: HUDI-5056
> URL: https://issues.apache.org/jira/browse/HUDI-5056
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Hussein Awala
>Priority: Major
>
> as of now, DELETE_PARTITIONS expected comma separated list of partitions to 
> delete. But would like to support wild card with that. 
> For eg,
> year=2022/month=10/day=05/*
> assuming its hour based partitioning
>  
> Ref: https://github.com/apache/hudi/issues/6866



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup

2022-10-23 Thread GitBox


hudi-bot commented on PR #7033:
URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288200718

   
   ## CI report:
   
   * 94bac61f150805e4ec82f3d5bf44b55699257534 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12471)
 
   * c4dcf26eba06562edb428e668fccbc94ed48f07b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12499)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7033: [MINOR] test cleanup

2022-10-23 Thread GitBox


hudi-bot commented on PR #7033:
URL: https://github.com/apache/hudi/pull/7033#issuecomment-1288199741

   
   ## CI report:
   
   * 94bac61f150805e4ec82f3d5bf44b55699257534 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12471)
 
   * c4dcf26eba06562edb428e668fccbc94ed48f07b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6946: [HUDI-5027] Improve getHBaseConnection Use Constants Replace HardCode.

2022-10-23 Thread GitBox


hudi-bot commented on PR #6946:
URL: https://github.com/apache/hudi/pull/6946#issuecomment-1288198608

   
   ## CI report:
   
   * 86099181bd76a59cdd1b537eb724f6f51ed0c711 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12494)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #7033: [MINOR] test cleanup

2022-10-23 Thread GitBox


the-other-tim-brown commented on code in PR #7033:
URL: https://github.com/apache/hudi/pull/7033#discussion_r1002767620


##
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieIndexer.java:
##
@@ -100,6 +101,16 @@ public void init() throws IOException {
 initMetaClient();
   }
 
+  @AfterAll
+  public static void cleanup() {

Review Comment:
   Yes, I've updated the code 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ft-bazookanu commented on issue #6970: [SUPPORT] Performance of Snapshot Exporter

2022-10-23 Thread GitBox


ft-bazookanu commented on issue #6970:
URL: https://github.com/apache/hudi/issues/6970#issuecomment-1288194206

   Please see https://hudi.apache.org/docs/snapshot_exporter/- partitioner 
configs are ignored when the output format is hudi. Moreover we're using this 
as a backup and do not want to repartition. I feel my issue is orthogonal to 
partitioning:
   - why does performance _decrease_ on increasing memory/cores per executor?
   - why does performance saturate at 16 executors, although the table has far 
more than 16 partitions? 
   -
   Most of the time is spent exporting the contents of `.hoodie/`, which 
appears to be happening serially (not parallel). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-10-23 Thread GitBox


hudi-bot commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288185718

   
   ## CI report:
   
   * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496)
 
   * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12498)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-10-23 Thread GitBox


hudi-bot commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288184587

   
   ## CI report:
   
   * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496)
 
   * acd416d779132b9fd7a7b1fe58eaaeebcf1b821f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6660: [MINOR] Skip loading last completed txn for single writer

2022-10-23 Thread GitBox


hudi-bot commented on PR #6660:
URL: https://github.com/apache/hudi/pull/6660#issuecomment-1288182996

   
   ## CI report:
   
   * 542d9421e85f8b745780102e2201982763ed8db3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12492)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6878: [HUDI-3397] Guard repeated rdd triggers

2022-10-23 Thread GitBox


xushiyan commented on code in PR #6878:
URL: https://github.com/apache/hudi/pull/6878#discussion_r1002756379


##
hudi-common/src/main/java/org/apache/hudi/common/data/HoodieListData.java:
##
@@ -148,6 +148,11 @@ public  HoodieData 
distinctWithKey(SerializableFunction keyGetter, i
 .values();
   }
 
+  @Override
+  public int getNumPartitions() {
+return 1;

Review Comment:
   to revert unneeded change



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##
@@ -176,7 +177,10 @@ public HoodieWriteMetadata> 
execute(HoodieData writeStatuses = 
mapPartitionsAsRDD(inputRecordsWithClusteringUpdate, partitioner);
 HoodieWriteMetadata> result = new 
HoodieWriteMetadata<>();
-updateIndexAndCommitIfNeeded(writeStatuses, result);
+// dereference rdd so that no double de-referencing can happen by mistake.
+int numPartitions = Math.max(1, writeStatuses.getNumPartitions());

Review Comment:
   don't think we need to guard it by min 1. the API getNumPartitions() should 
guarantee meaningful return value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46

2022-10-23 Thread GitBox


hudi-bot commented on PR #7003:
URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288172095

   
   ## CI report:
   
   * 9671826a7dfc417a79ad00e4eb4feec09853acb2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12465)
 
   * 77ff687b1e0e945d6658ffe47992bd85484d78b2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12497)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7003: [minor] add more test for rfc46

2022-10-23 Thread GitBox


hudi-bot commented on PR #7003:
URL: https://github.com/apache/hudi/pull/7003#issuecomment-1288171209

   
   ## CI report:
   
   * 9671826a7dfc417a79ad00e4eb4feec09853acb2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12465)
 
   * 77ff687b1e0e945d6658ffe47992bd85484d78b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7021: [Minor] fix multi deser avro payload

2022-10-23 Thread GitBox


hudi-bot commented on PR #7021:
URL: https://github.com/apache/hudi/pull/7021#issuecomment-1288169004

   
   ## CI report:
   
   * 359ee069037b3e252564ef71668d8453e1481267 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12466)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12478)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12491)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-10-23 Thread GitBox


hudi-bot commented on PR #6632:
URL: https://github.com/apache/hudi/pull/6632#issuecomment-1288168839

   
   ## CI report:
   
   * d9e12ddf962b670b8ec1e2260d5389c688e16001 UNKNOWN
   * ba3513d5b65e39f7cbb71e851ddd34cfe9d846a0 UNKNOWN
   * 0836cbf5794ede5be427ef529cf7b660c2a6f4fa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12480)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12490)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-10-23 Thread GitBox


hudi-bot commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288143002

   
   ## CI report:
   
   * 3300f7bdbf9d1cb178390d36523db2ec0279448c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12496)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-10-23 Thread GitBox


hudi-bot commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1288142113

   
   ## CI report:
   
   * 3300f7bdbf9d1cb178390d36523db2ec0279448c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >