[GitHub] [hudi] danny0405 commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.

2022-07-24 Thread GitBox


danny0405 commented on code in PR #6093:
URL: https://github.com/apache/hudi/pull/6093#discussion_r927388993


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
   pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, 
hoodieRecordDataStream);
   // compaction
   if (OptionsResolver.needsAsyncCompaction(conf)) {
+// batch mode write must use syncCompaction.
+if (context.isBounded()) {
+  conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false);

Review Comment:
   In streaming exec mode, bounded source would also trigger checkpoints, 
should we disable the async compaction for them ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.

2022-07-24 Thread GitBox


danny0405 commented on code in PR #6093:
URL: https://github.com/apache/hudi/pull/6093#discussion_r928524041


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
   pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, 
hoodieRecordDataStream);
   // compaction
   if (OptionsResolver.needsAsyncCompaction(conf)) {
+// batch mode write must use syncCompaction.
+if (context.isBounded()) {
+  conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false);

Review Comment:
   Not exactly, because the bounded source can also be long running.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-24 Thread GitBox


danny0405 commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1193649030

   > 
   
   Not very persuaded by the improvement number: read 33% and write 9%, if the 
number is real and can be re-productive, i would suggest to lower priority of 
the patch, for example, after release 1.0.0.
   
   I had expected about 5x ~ 10x performance improvement, BTW.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-07-24 Thread GitBox


wzx140 commented on code in PR #5629:
URL: https://github.com/apache/hudi/pull/5629#discussion_r928518937


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java:
##
@@ -21,64 +21,46 @@
 
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.util.Option;
-import org.apache.hudi.common.util.SpillableMapUtils;
 import org.apache.hudi.common.util.collection.Pair;
-import org.apache.hudi.config.HoodiePayloadConfig;
 import org.apache.hudi.exception.HoodieIOException;
-import org.apache.hudi.io.storage.HoodieAvroFileReader;
+import org.apache.hudi.io.storage.HoodieFileReader;
 
 import org.apache.avro.Schema;
-import org.apache.avro.generic.GenericRecord;
 
 import java.io.IOException;
 import java.util.Iterator;
+import java.util.Properties;
 import java.util.stream.StreamSupport;
 
 /**
  * Reads records from base file and merges any updates from log files and 
provides iterable over all records in the file slice.
  */
 public class HoodieFileSliceReader implements Iterator> {
+
   private final Iterator> recordsIterator;
 
   public static HoodieFileSliceReader getFileSliceReader(
-  Option baseFileReader, 
HoodieMergedLogRecordScanner scanner, Schema schema, String payloadClass,
-  String preCombineField, Option> 
simpleKeyGenFieldsOpt) throws IOException {
+  Option baseFileReader, HoodieMergedLogRecordScanner 
scanner, Schema schema, Properties props, Option> 
simpleKeyGenFieldsOpt) throws IOException {
 if (baseFileReader.isPresent()) {
-  Iterator baseIterator = baseFileReader.get().getRecordIterator(schema);
+  Iterator baseIterator = 
baseFileReader.get().getRecordIterator(schema);
   while (baseIterator.hasNext()) {
-GenericRecord record = (GenericRecord) baseIterator.next();
-HoodieRecord hoodieRecord = transform(
-record, scanner, payloadClass, preCombineField, 
simpleKeyGenFieldsOpt);
-scanner.processNextRecord(hoodieRecord);
+scanner.processNextRecord(baseIterator.next().expansion(props, 
simpleKeyGenFieldsOpt,
+scanner.isWithOperationField(), scanner.getPartitionName(), 
false));
   }
   return new HoodieFileSliceReader(scanner.iterator());
 } else {
   Iterable iterable = () -> scanner.iterator();
-  HoodiePayloadConfig payloadConfig = 
HoodiePayloadConfig.newBuilder().withPayloadOrderingField(preCombineField).build();
   return new 
HoodieFileSliceReader(StreamSupport.stream(iterable.spliterator(), false)
   .map(e -> {
 try {
-  GenericRecord record = (GenericRecord) e.toIndexedRecord(schema, 
payloadConfig.getProps()).get();
-  return transform(record, scanner, payloadClass, preCombineField, 
simpleKeyGenFieldsOpt);
+  return e.expansion(props, simpleKeyGenFieldsOpt, 
scanner.isWithOperationField(), scanner.getPartitionName(), false);

Review Comment:
   I looked at this carefully and found that expansion func is not unnecessary 
here. I also change the func names. expansion -> getKeyWithParams and transform 
-> getKeyWithKeyGen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4458) Add a converter cache for flink ColumnStatsIndices

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4458:
-
Labels: pull-request-available  (was: )

> Add a converter cache for flink ColumnStatsIndices
> --
>
> Key: HUDI-4458
> URL: https://issues.apache.org/jira/browse/HUDI-4458
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 opened a new pull request, #6205: [HUDI-4458] Add a converter cache for flink ColumnStatsIndices

2022-07-24 Thread GitBox


danny0405 opened a new pull request, #6205:
URL: https://github.com/apache/hudi/pull/6205

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4459) Corrupt parquet file created when syncing huge table with 4000+ fields,using hudi cow table with bulk_insert type

2022-07-24 Thread Leo zhang (Jira)
Leo zhang created HUDI-4459:
---

 Summary: Corrupt parquet file created when syncing huge table with 
4000+ fields,using hudi cow table with bulk_insert type
 Key: HUDI-4459
 URL: https://issues.apache.org/jira/browse/HUDI-4459
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Leo zhang
 Attachments: statements.sql, table.ddl

I am trying to sync a huge table with 4000+ fields into hudi, using cow table 
with bulk_insert  operate type.

The job can finished without any exception,but when I am trying to read data 
from the table,I get empty result.The parquet file is corrupted, can't be read 
correctly. 

I had tried to  trace the problem, and found it was coused by SortOperator. 
After the record is serialized in the sorter, all the field get disorder and is 
deserialized into one field.And finally the wrong record is written into 
parquet file,and make the file unreadable.


Here's a few step to reproduce the bug ine the flink sql-client:

1、execute the table ddl(provided in the table.ddl file  in the attachments)

2、execute the insert statement (provided in the statement.sql file  in the 
attachments)

3、execute a select statement to query hudi table  (provided in the 
statement.sql file  in the attachments)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4458) Add a converter cache for flink ColumnStatsIndices

2022-07-24 Thread Danny Chen (Jira)
Danny Chen created HUDI-4458:


 Summary: Add a converter cache for flink ColumnStatsIndices
 Key: HUDI-4458
 URL: https://issues.apache.org/jira/browse/HUDI-4458
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Danny Chen
 Fix For: 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


hudi-bot commented on PR #6203:
URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193617884

   
   ## CI report:
   
   * 745324e449ab6c81eabd274bfbb15a8d5fb3918e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10300)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193617853

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 45a5851255b57276491a3a8914783fefdc5563cc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193617474

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296)
 
   * c015e22540af7ea164c1216874e37202b8cae10e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


hudi-bot commented on PR #6203:
URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193614845

   
   ## CI report:
   
   * 745324e449ab6c81eabd274bfbb15a8d5fb3918e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


hudi-bot commented on PR #6203:
URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193612560

   
   ## CI report:
   
   * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10291)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193609887

   
   ## CI report:
   
   * 16ff6fba9e82e35bfb202902f22e6c59ade998ff Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10298)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199)

2022-07-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f6e7227ed5 [MINOR] Only log stdout output for non-zero exit from 
commands in IT (#6199)
f6e7227ed5 is described below

commit f6e7227ed548ea5bac66e224df42e2985fb814a9
Author: Y Ethan Guo 
AuthorDate: Sun Jul 24 22:08:33 2022 -0700

[MINOR] Only log stdout output for non-zero exit from commands in IT (#6199)
---
 hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java 
b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
index 8115d50a78..dcb6367802 100644
--- a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
+++ b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java
@@ -236,7 +236,9 @@ public abstract class ITTestBase {
 int exitCode = 
dockerClient.inspectExecCmd(createCmdResponse.getId()).exec().getExitCode();
 LOG.info("Exit code for command : " + exitCode);
 
-LOG.error("\n\n ## Stdout ###\n" + 
callback.getStdout().toString());
+if (exitCode != 0) {
+  LOG.error("\n\n ## Stdout ###\n" + 
callback.getStdout().toString());
+}
 LOG.error("\n\n ## Stderr ###\n" + 
callback.getStderr().toString());
 
 if (checkIfSucceed) {



[GitHub] [hudi] xushiyan merged pull request #6199: [MINOR] Only log stdout output for non-zero exit from commands in IT

2022-07-24 Thread GitBox


xushiyan merged PR #6199:
URL: https://github.com/apache/hudi/pull/6199


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #6199: [MINOR] Only log stdout output for non-zero exit from commands in IT

2022-07-24 Thread GitBox


xushiyan commented on PR #6199:
URL: https://github.com/apache/hudi/pull/6199#issuecomment-1193583216

   https://issues.apache.org/jira/browse/HUDI-4457
   @yihua we can follow up on this. will land this. (CI failure is due to 
irrelevant flakiness)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4457) Make sure IT docker test return code non-zero when failed

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4457:
-
Description: 
IT testcase where docker command runs and returns exit code 0, but test 
actually failed. This will be misleading for troubleshooting.

TODO
1. verify the behavior
2. fix it

> Make sure IT docker test return code non-zero when failed
> -
>
> Key: HUDI-4457
> URL: https://issues.apache.org/jira/browse/HUDI-4457
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Major
>
> IT testcase where docker command runs and returns exit code 0, but test 
> actually failed. This will be misleading for troubleshooting.
> TODO
> 1. verify the behavior
> 2. fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-4457) Make sure IT docker test return code non-zero when failed

2022-07-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-4457:


 Summary: Make sure IT docker test return code non-zero when failed
 Key: HUDI-4457
 URL: https://issues.apache.org/jira/browse/HUDI-4457
 Project: Apache Hudi
  Issue Type: Bug
  Components: tests-ci
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193581179

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193577362

   
   ## CI report:
   
   * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294)
 
   * 16ff6fba9e82e35bfb202902f22e6c59ade998ff Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10298)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193577152

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287)
 
   * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193575406

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 8ef79398f29f16623e470320af4db1a113d14dab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290)
 
   * 45a5851255b57276491a3a8914783fefdc5563cc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193575153

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287)
 
   * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193573133

   
   ## CI report:
   
   * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294)
 
   * 16ff6fba9e82e35bfb202902f22e6c59ade998ff UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193573211

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 8ef79398f29f16623e470320af4db1a113d14dab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290)
 
   * 45a5851255b57276491a3a8914783fefdc5563cc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201)

2022-07-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 76a28daeb0 [HUDI-4456] Close FileSystem in 
SparkClientFunctionalTestHarness  (#6201)
76a28daeb0 is described below

commit 76a28daeb08e7192d75dfc447624c827643bef0d
Author: Tim Brown 
AuthorDate: Sun Jul 24 21:42:15 2022 -0700

[HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness  (#6201)
---
 .../hudi/testutils/SparkClientFunctionalTestHarness.java  | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
index f9676c6c47..c58dd178dc 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
@@ -67,6 +67,7 @@ import org.apache.spark.sql.Row;
 import org.apache.spark.sql.SQLContext;
 import org.apache.spark.sql.SparkSession;
 import org.junit.jupiter.api.AfterAll;
+import org.junit.jupiter.api.AfterEach;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.io.TempDir;
 
@@ -96,6 +97,7 @@ public class SparkClientFunctionalTestHarness implements 
SparkProvider, HoodieMe
   private static transient JavaSparkContext jsc;
   private static transient HoodieSparkEngineContext context;
   private static transient TimelineService timelineService;
+  private FileSystem fileSystem;
 
   /**
* An indicator of the initialization status.
@@ -128,7 +130,10 @@ public class SparkClientFunctionalTestHarness implements 
SparkProvider, HoodieMe
   }
 
   public FileSystem fs() {
-return FSUtils.getFs(basePath(), hadoopConf());
+if (fileSystem == null) {
+  fileSystem = FSUtils.getFs(basePath(), hadoopConf());
+}
+return fileSystem;
   }
 
   @Override
@@ -208,6 +213,14 @@ public class SparkClientFunctionalTestHarness implements 
SparkProvider, HoodieMe
 }
   }
 
+  @AfterEach
+  public void closeFileSystem() throws IOException {
+if (fileSystem != null) {
+  fileSystem.close();
+  fileSystem = null;
+}
+  }
+
   protected JavaRDD tagLocation(
   HoodieIndex index, JavaRDD records, HoodieTable table) {
 return HoodieJavaRDD.getJavaRDD(



[GitHub] [hudi] xushiyan merged pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


xushiyan merged PR #6201:
URL: https://github.com/apache/hudi/pull/6201


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [MINOR] Fix typos in Spark client related classes (#6204)

2022-07-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a08a65f71 [MINOR] Fix typos in Spark client related classes (#6204)
2a08a65f71 is described below

commit 2a08a65f719b5c155dde85a0dc318af5033c31d5
Author: Vander <30547463+vande...@users.noreply.github.com>
AuthorDate: Mon Jul 25 12:41:42 2022 +0800

[MINOR] Fix typos in Spark client related classes (#6204)
---
 .../clustering/run/strategy/SingleSparkJobExecutionStrategy.java| 2 +-
 .../org/apache/hudi/client/utils/SparkInternalSchemaConverter.java  | 4 ++--
 .../main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java | 2 +-
 .../org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java | 6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java
index 1158d0ada4..bb6d3df5f1 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java
@@ -136,7 +136,7 @@ public abstract class SingleSparkJobExecutionStrategy> 
performClusteringWithRecordsIterator(final Iterator> records, 
final int numOutputGroups,

final String instantTime,
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java
index 8e086c2927..098870a60a 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java
@@ -81,7 +81,7 @@ public class SparkInternalSchemaConverter {
   public static final String HOODIE_VALID_COMMITS_LIST = 
"hoodie.valid.commits.list";
 
   /**
-   * Converts a spark schema to an hudi internal schema. Fields without IDs 
are kept and assigned fallback IDs.
+   * Convert a spark schema to an hudi internal schema. Fields without IDs are 
kept and assigned fallback IDs.
*
* @param sparkSchema a spark schema
* @return a matching internal schema for the provided spark schema
@@ -157,7 +157,7 @@ public class SparkInternalSchemaConverter {
   }
 
   /**
-   * Converts Spark schema to Hudi internal schema, and prune fields.
+   * Convert Spark schema to Hudi internal schema, and prune fields.
* Fields without IDs are kept and assigned fallback IDs.
*
* @param sparkSchema a pruned spark schema
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
index fd083f2c89..a6d03eae2b 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java
@@ -50,7 +50,7 @@ import java.util.stream.Stream;
 import scala.collection.JavaConverters;
 
 /**
- * Spark validator utils to verify and run any precommit validators configured.
+ * Spark validator utils to verify and run any pre-commit validators 
configured.
  */
 public class SparkValidatorUtils {
   private static final Logger LOG = 
LogManager.getLogger(BaseSparkCommitActionExecutor.class);
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java
index 491c6700c9..9e74d14c04 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java
@@ -308,7 +308,7 @@ public class HoodieAvroDataBlock extends HoodieDataBlock {
 ByteArrayOutputStream baos = new ByteArrayOutputStream();
 DataOutputStream output = new DataOutputStream(baos);
 
-// 2. Compress and Write schema out
+// 1. Compress and Write schema out
 byte[] schemaContent = compress(schema.toString());
 output.writeInt(schemaContent.length);
 output.write(schemaContent);
@@ -318,10 +318,10 @@ public class HoodieAvroDataBlock extends HoodieDataBlock {
   recordItr.forEachRemaining(records::add);
 }
 
-// 3. Write total number of 

[GitHub] [hudi] xushiyan merged pull request #6204: [MINOR] Fix typos in Spark client related classes

2022-07-24 Thread GitBox


xushiyan merged PR #6204:
URL: https://github.com/apache/hudi/pull/6204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193570330

   
   ## CI report:
   
   * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193570427

   
   ## CI report:
   
   * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown closed pull request #6190: a simple test

2022-07-24 Thread GitBox


the-other-tim-brown closed pull request #6190: a simple test
URL: https://github.com/apache/hudi/pull/6190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vanderzh commented on a diff in pull request #6204: Fix typos in Spark client related classes

2022-07-24 Thread GitBox


vanderzh commented on code in PR #6204:
URL: https://github.com/apache/hudi/pull/6204#discussion_r928459060


##
.idea/vcs.xml:
##
@@ -1,36 +1,6 @@
 
-

[GitHub] [hudi] xushiyan closed pull request #5643: [HUDI-4071] Change defaults for some of the configs

2022-07-24 Thread GitBox


xushiyan closed pull request #5643: [HUDI-4071] Change defaults for some of the 
configs
URL: https://github.com/apache/hudi/pull/5643


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6204: Fix typos in Spark client related classes

2022-07-24 Thread GitBox


xushiyan commented on code in PR #6204:
URL: https://github.com/apache/hudi/pull/6204#discussion_r928457355


##
.idea/vcs.xml:
##
@@ -1,36 +1,6 @@
 
-

[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193540808

   
   ## CI report:
   
   * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286)
 
   * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193533627

   
   ## CI report:
   
   * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286)
 
   * 34485e3a7df2712077f5987f930b7a6fa33a3986 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6204: Fix typos in Spark client related classes

2022-07-24 Thread GitBox


hudi-bot commented on PR #6204:
URL: https://github.com/apache/hudi/pull/6204#issuecomment-1193526000

   
   ## CI report:
   
   * 8bee5ca11e11c53a2100097c8106bbff9aaf5871 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6204: Fix typos in Spark client related classes

2022-07-24 Thread GitBox


hudi-bot commented on PR #6204:
URL: https://github.com/apache/hudi/pull/6204#issuecomment-1193520249

   
   ## CI report:
   
   * 8bee5ca11e11c53a2100097c8106bbff9aaf5871 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193520209

   
   ## CI report:
   
   * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193518013

   
   ## CI report:
   
   * 36cc806477cb75f8c168ce0420849886ab5e650f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vanderzh opened a new pull request, #6204: Fix typos in Spark client related classes

2022-07-24 Thread GitBox


vanderzh opened a new pull request, #6204:
URL: https://github.com/apache/hudi/pull/6204

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   This PR fixes a few typos in Spark client related classes.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent

2022-07-24 Thread GitBox


codope commented on PR #6098:
URL: https://github.com/apache/hudi/pull/6098#issuecomment-1193500450

   > I did not fully understand the bulk insert row writing part. But Can we 
get it fixed in 0.12 please
   
   Yes that's gonna be in 0.12. It's in #6099 but stacked on top of this one. I 
will decouple the two.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a diff in pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent

2022-07-24 Thread GitBox


vinothchandar commented on code in PR #6098:
URL: https://github.com/apache/hudi/pull/6098#discussion_r928369447


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala:
##
@@ -84,20 +96,62 @@ class HoodieStreamingSink(sqlContext: SQLContext,
 var updatedOptions = options.updated(HoodieWriteConfig.MARKERS_TYPE.key(), 
MarkerType.DIRECT.name())
 // we need auto adjustment enabled for streaming sink since async table 
services are feasible within the same JVM.
 updatedOptions = 
updatedOptions.updated(HoodieWriteConfig.AUTO_ADJUST_LOCK_CONFIGS.key, "true")
+// disable row writer bulk insert of write stream
+if (options.getOrDefault(OPERATION.key, 
UPSERT_OPERATION_OPT_VAL).equalsIgnoreCase(BULK_INSERT_OPERATION_OPT_VAL)) {

Review Comment:
   Row writing is a top priority no? Love to understand this more.



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala:
##
@@ -247,4 +285,18 @@ class HoodieStreamingSink(sqlContext: SQLContext,
   writeClient = Option.empty
 }
   }
+
+  private def canSkipBatch(batchId: Long): Boolean = {
+// get the latest checkpoint from the commit metadata to check if the 
microbatch has already been prcessed or not
+val lastCommit = 
metaClient.get.getActiveTimeline.getCommitsTimeline.filterCompletedInstants().lastInstant()
+if (lastCommit.isPresent) {
+  val commitMetadata = HoodieCommitMetadata.fromBytes(
+
metaClient.get.getActiveTimeline.getInstantDetails(lastCommit.get()).get(), 
classOf[HoodieCommitMetadata])
+  val lastCheckpoint = commitMetadata.getMetadata(SinkCheckpointKey)
+  if (!StringUtils.isNullOrEmpty(lastCheckpoint)) {
+latestBatchId = lastCheckpoint.toLong
+  }
+}
+latestBatchId >= batchId

Review Comment:
   +1 Might be good to make the data model support multiple values from day 1 



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala:
##
@@ -48,12 +50,24 @@ class HoodieStreamingSink(sqlContext: SQLContext,
 
   private val log = LogManager.getLogger(classOf[HoodieStreamingSink])
 
-  private val retryCnt = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_CNT.key,
-DataSourceWriteOptions.STREAMING_RETRY_CNT.defaultValue).toInt
-  private val retryIntervalMs = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.key,
-DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong
-  private val ignoreFailedBatch = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.key,
-
DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean
+  private val tablePath = new Path(options.getOrElse("path", "Missing 'path' 
option"))
+  private var metaClient: Option[HoodieTableMetaClient] = {
+try {
+  
Some(HoodieTableMetaClient.builder().setConf(sqlContext.sparkContext.hadoopConfiguration).setBasePath(tablePath.toString).build())
+} catch {
+  case _: TableNotFoundException =>
+log.warn("Ignore TableNotFoundException as it is first microbatch.")
+Option.empty
+}
+  }
+  private val retryCnt = options.getOrDefault(STREAMING_RETRY_CNT.key,
+STREAMING_RETRY_CNT.defaultValue).toInt
+  private val retryIntervalMs = 
options.getOrDefault(STREAMING_RETRY_INTERVAL_MS.key,
+STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong
+  private val ignoreFailedBatch = 
options.getOrDefault(STREAMING_IGNORE_FAILED_BATCH.key,

Review Comment:
   TBH I think we should make it fail by default and not ignore. Original 
author from Apple wanted itthat way for them. But probably does not make sense 
at this point anymore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on a diff in pull request #6028: [HUDI-4355] Bulk insert As Row: Should also repartiiton records if populateMetaFields is false

2022-07-24 Thread GitBox


boneanxs commented on code in PR #6028:
URL: https://github.com/apache/hudi/pull/6028#discussion_r928368597


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -523,17 +523,19 @@ object HoodieSparkSqlWriter {
 val params: mutable.Map[String, String] = 
collection.mutable.Map(parameters.toSeq: _*)
 params(HoodieWriteConfig.AVRO_SCHEMA_STRING.key) = schema.toString
 val writeConfig = DataSourceUtils.createHoodieConfig(schema.toString, 
path, tblName, mapAsJavaMap(params))
-val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = if 
(populateMetaFields) {
+val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = {
   val userDefinedBulkInsertPartitionerOpt = 
DataSourceUtils.createUserDefinedBulkInsertPartitionerWithRows(writeConfig)

Review Comment:
   Whether we should have a new method in `partitioner` to validate columns 
meet requirement(like return mandatoryFields, and we use it to check)? 
Currently if users set user-defined partitioner which acquire metafields, we 
will also accept it and not throw errors...
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] vinothchandar commented on a diff in pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent

2022-07-24 Thread GitBox


vinothchandar commented on code in PR #6098:
URL: https://github.com/apache/hudi/pull/6098#discussion_r928368463


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala:
##
@@ -48,12 +50,24 @@ class HoodieStreamingSink(sqlContext: SQLContext,
 
   private val log = LogManager.getLogger(classOf[HoodieStreamingSink])
 
-  private val retryCnt = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_CNT.key,
-DataSourceWriteOptions.STREAMING_RETRY_CNT.defaultValue).toInt
-  private val retryIntervalMs = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.key,
-DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong
-  private val ignoreFailedBatch = 
options.getOrDefault(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.key,
-
DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean
+  private val tablePath = new Path(options.getOrElse("path", "Missing 'path' 
option"))
+  private var metaClient: Option[HoodieTableMetaClient] = {
+try {
+  
Some(HoodieTableMetaClient.builder().setConf(sqlContext.sparkContext.hadoopConfiguration).setBasePath(tablePath.toString).build())
+} catch {
+  case _: TableNotFoundException =>
+log.warn("Ignore TableNotFoundException as it is first microbatch.")
+Option.empty
+}
+  }
+  private val retryCnt = options.getOrDefault(STREAMING_RETRY_CNT.key,
+STREAMING_RETRY_CNT.defaultValue).toInt
+  private val retryIntervalMs = 
options.getOrDefault(STREAMING_RETRY_INTERVAL_MS.key,
+STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong
+  private val ignoreFailedBatch = 
options.getOrDefault(STREAMING_IGNORE_FAILED_BATCH.key,
+STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean
+  // This constant serves as the checkpoint key for streaming sink so that 
each microbatch is processed exactly-once.
+  private val SinkCheckpointKey = "_streaming_sink_checkpoint"

Review Comment:
   Add a " _ hudi " prefixto the key? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.

2022-07-24 Thread GitBox


LinMingQiang commented on code in PR #6093:
URL: https://github.com/apache/hudi/pull/6093#discussion_r928363020


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java:
##
@@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context 
context) {
   pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, 
hoodieRecordDataStream);
   // compaction
   if (OptionsResolver.needsAsyncCompaction(conf)) {
+// batch mode write must use syncCompaction.
+if (context.isBounded()) {
+  conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false);

Review Comment:
   My idea is that when the source is bounded, we should not do compaction in 
checkpoint, because compaction will be done once in `endinput`. Am I right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5643: [HUDI-4071] Change defaults for some of the configs

2022-07-24 Thread GitBox


danny0405 commented on code in PR #5643:
URL: https://github.com/apache/hudi/pull/5643#discussion_r928360330


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -349,7 +349,7 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty 
EMBEDDED_TIMELINE_SERVER_USE_ASYNC_ENABLE = ConfigProperty
   .key("hoodie.embed.timeline.server.async")
-  .defaultValue("false")
+  .defaultValue("true")
   .withDocumentation("Controls whether or not, the requests to the 
timeline server are processed in asynchronous fashion, "

Review Comment:
   30+ commits is too few to reproduce, in #6179 , we run about 2000+ commits 
to reproduce the problem. I would suggest you to do the same test before switch 
the flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #5643: [HUDI-4071] Change defaults for some of the configs

2022-07-24 Thread GitBox


danny0405 commented on code in PR #5643:
URL: https://github.com/apache/hudi/pull/5643#discussion_r928360330


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -349,7 +349,7 @@ public class HoodieWriteConfig extends HoodieConfig {
 
   public static final ConfigProperty 
EMBEDDED_TIMELINE_SERVER_USE_ASYNC_ENABLE = ConfigProperty
   .key("hoodie.embed.timeline.server.async")
-  .defaultValue("false")
+  .defaultValue("true")
   .withDocumentation("Controls whether or not, the requests to the 
timeline server are processed in asynchronous fashion, "

Review Comment:
   30+ commits is too few to reproduce the problem, in #6179 , we run about 
2000+ commits to reproduce the problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jiezi2026 commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-07-24 Thread GitBox


jiezi2026 commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1193472473

   We also encountered the same problem with  hudi-0.11.1 & spark-3.2.1,and our 
current temporary method is set hoodie.metadata.enable=false.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] leesf commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


leesf commented on code in PR #5943:
URL: https://github.com/apache/hudi/pull/5943#discussion_r928353695


##
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/HoodieSpark32CatalystPlanUtils.scala:
##
@@ -13,7 +13,7 @@
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
- * limitations under the License.
+ * limitations under the License.a

Review Comment:
   please revert this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] leesf commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


leesf commented on code in PR #5943:
URL: https://github.com/apache/hudi/pull/5943#discussion_r928353409


##
hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/adapter/BaseSpark3Adapter.scala:
##
@@ -81,23 +80,12 @@ abstract class BaseSpark3Adapter extends SparkAdapter with 
Logging {
 }
   }
 
-  override def createExtendedSparkParser: Option[(SparkSession, 
ParserInterface) => ParserInterface] = {
-// since spark3.2.1 support datasourceV2, so we need to a new SqlParser to 
deal DDL statment
-if (SPARK_VERSION.startsWith("3.1")) {
-  val loadClassName = 
"org.apache.spark.sql.parser.HoodieSpark312ExtendedSqlParser"
-  Some {
-(spark: SparkSession, delegate: ParserInterface) => {
-  val clazz = Class.forName(loadClassName, true, 
Thread.currentThread().getContextClassLoader)
-  val ctor = clazz.getConstructors.head
-  ctor.newInstance(spark, delegate).asInstanceOf[ParserInterface]
-}
-  }
-} else {
-  None
-}
-  }
-
   override def createInterpretedPredicate(e: Expression): InterpretedPredicate 
= {
 Predicate.createInterpreted(e)
   }
+
+  override def getQueryParserFromExtendedSqlParser(session: SparkSession, 
delegate: ParserInterface,

Review Comment:
   can this method defined in `SparkAdapter` and default implement is 
unsupported ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (a54c963543 -> 1a910fd473)

2022-07-24 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a54c963543 [HUDI-4348] fix merge into sql data quality in concurrent 
scene (#6020)
 add 1a910fd473 [HUDI-3510] Add sync validate procedure (#6200)

No new revisions were added by this update.

Summary of changes:
 ...Command.java => HoodieSyncValidateCommand.java} |   2 +-
 .../hudi/command/procedures/HoodieProcedures.scala |   1 +
 .../procedures/ValidateHoodieSyncProcedure.scala   | 208 +
 3 files changed, 210 insertions(+), 1 deletion(-)
 rename 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/{HoodieSyncCommand.java => 
HoodieSyncValidateCommand.java} (98%)
 create mode 100644 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ValidateHoodieSyncProcedure.scala



[GitHub] [hudi] XuQianJin-Stars merged pull request #6200: [HUDI-3510] Add sync validate procedure

2022-07-24 Thread GitBox


XuQianJin-Stars merged PR #6200:
URL: https://github.com/apache/hudi/pull/6200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193451342

   
   ## CI report:
   
   * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN
   * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193451044

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193431217

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 8ef79398f29f16623e470320af4db1a113d14dab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


hudi-bot commented on PR #6203:
URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193428355

   
   ## CI report:
   
   * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10291)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193428343

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 8ef79398f29f16623e470320af4db1a113d14dab Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193426832

   
   ## CI report:
   
   * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288)
 
   * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN
   * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


hudi-bot commented on PR #6203:
URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193426856

   
   ## CI report:
   
   * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193426841

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   * 8ef79398f29f16623e470320af4db1a113d14dab UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193425361

   
   ## CI report:
   
   * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288)
 
   * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN
   * 36cc806477cb75f8c168ce0420849886ab5e650f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


hudi-bot commented on PR #6202:
URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193425372

   
   ## CI report:
   
   * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4456) Clean up test resources

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4456:
-
Labels: pull-request-available  (was: )

> Clean up test resources
> ---
>
> Key: HUDI-4456
> URL: https://issues.apache.org/jira/browse/HUDI-4456
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan opened a new pull request, #6203: [HUDI-4456] Clean up test resources

2022-07-24 Thread GitBox


xushiyan opened a new pull request, #6203:
URL: https://github.com/apache/hudi/pull/6203

   Clean up resources from local hdfs cluster and zookeeper cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4456) Clean up test resources

2022-07-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-4456:


 Summary: Clean up test resources
 Key: HUDI-4456
 URL: https://issues.apache.org/jira/browse/HUDI-4456
 Project: Apache Hudi
  Issue Type: Improvement
  Components: tests-ci
Reporter: Raymond Xu
 Fix For: 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4456) Clean up test resources

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-4456:


Assignee: Timothy Brown

> Clean up test resources
> ---
>
> Key: HUDI-4456
> URL: https://issues.apache.org/jira/browse/HUDI-4456
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Timothy Brown
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4441) Disbale INFO level logs from tests

2022-07-24 Thread Timothy Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown reassigned HUDI-4441:
---

Assignee: Timothy Brown

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6201: [minor] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193415275

   
   ## CI report:
   
   * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288)
 
   * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #6201: [minor] Close FileSystem in SparkClientFunctionalTestHarness

2022-07-24 Thread GitBox


the-other-tim-brown commented on code in PR #6201:
URL: https://github.com/apache/hudi/pull/6201#discussion_r928325681


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java:
##
@@ -208,6 +213,13 @@ public static synchronized void resetSpark() {
 }
   }
 
+  @AfterEach
+  public void closeFilesystem() throws IOException {
+if (fileSystem != null) {
+  fileSystem.close();

Review Comment:
   Updated to set it to null after close



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4455) Improve TestHiveSyncTool and related test classes

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4455:
-
Labels: pull-request-available  (was: )

> Improve TestHiveSyncTool and related test classes
> -
>
> Key: HUDI-4455
> URL: https://issues.apache.org/jira/browse/HUDI-4455
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan opened a new pull request, #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool

2022-07-24 Thread GitBox


xushiyan opened a new pull request, #6202:
URL: https://github.com/apache/hudi/pull/6202

   Improve HiveTestService, HiveTestUtil, and related classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193413279

   
   ## CI report:
   
   * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193413199

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * e5c73240ef14486c14af348269616a1846b487a9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282)
 
   * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-4437) resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-4437.

 Reviewers: Raymond Xu
Resolution: Fixed

> resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool
> ---
>
> Key: HUDI-4437
> URL: https://issues.apache.org/jira/browse/HUDI-4437
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4437) resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4437:
-
Fix Version/s: 0.12.0

> resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool
> ---
>
> Key: HUDI-4437
> URL: https://issues.apache.org/jira/browse/HUDI-4437
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…

2022-07-24 Thread GitBox


hudi-bot commented on PR #6201:
URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193412428

   
   ## CI report:
   
   * ee5654e47b5c8b837073c2e83464163a25d9dc72 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193412308

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * e5c73240ef14486c14af348269616a1846b487a9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282)
 
   * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-4455) Improve TestHiveSyncTool and related test classes

2022-07-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-4455:


 Summary: Improve TestHiveSyncTool and related test classes
 Key: HUDI-4455
 URL: https://issues.apache.org/jira/browse/HUDI-4455
 Project: Apache Hudi
  Issue Type: Improvement
  Components: tests-ci
Reporter: Raymond Xu
 Fix For: 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193411468

   
   ## CI report:
   
   * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-3822) Fail metadata table validation early for mismatch file slice if timeline has no inflight instant

2022-07-24 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570566#comment-17570566
 ] 

Raymond Xu commented on HUDI-3822:
--

[~guoyihua] not sure if this is resolved. can you confirm pls?

> Fail metadata table validation early for mismatch file slice if timeline has 
> no inflight instant
> 
>
> Key: HUDI-3822
> URL: https://issues.apache.org/jira/browse/HUDI-3822
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Bowen Zhu
>Priority: Minor
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/pull/5234/files/700f80ec372c2a75cf75754f68d6ee2eb0e7fe3b#diff-67533f5d7bf0e672db06b465b914e313cd197ef9a1648f663e1da625df753eac
> We can check data table timeline and check if there are any inflights. and if 
> its committed in MDT and then proceed w/ further checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-3822) Fail metadata table validation early for mismatch file slice if timeline has no inflight instant

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3822:


Assignee: Bowen Zhu

> Fail metadata table validation early for mismatch file slice if timeline has 
> no inflight instant
> 
>
> Key: HUDI-3822
> URL: https://issues.apache.org/jira/browse/HUDI-3822
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Bowen Zhu
>Priority: Minor
> Fix For: 0.12.0
>
>
> https://github.com/apache/hudi/pull/5234/files/700f80ec372c2a75cf75754f68d6ee2eb0e7fe3b#diff-67533f5d7bf0e672db06b465b914e313cd197ef9a1648f663e1da625df753eac
> We can check data table timeline and check if there are any inflights. and if 
> its committed in MDT and then proceed w/ further checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-2118) Avoid checking corrupt log blocks for cloud storage

2022-07-24 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-2118:


Assignee: Bowen Zhu

> Avoid checking corrupt log blocks for cloud storage
> ---
>
> Key: HUDI-2118
> URL: https://issues.apache.org/jira/browse/HUDI-2118
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Rajesh Mahindra
>Assignee: Bowen Zhu
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on pull request #6197: [HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource

2022-07-24 Thread GitBox


xushiyan commented on PR #6197:
URL: https://github.com/apache/hudi/pull/6197#issuecomment-1193409613

   @XuQianJin-Stars probably some scenarios in call procedure do not support 
using marker (i have not dived in the failures myself). if you have time, pls 
help check this. It should be set for each subclass of base procedure if not 
supporting marker, instead of at the base level. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…

2022-07-24 Thread GitBox


xushiyan commented on code in PR #6201:
URL: https://github.com/apache/hudi/pull/6201#discussion_r928321553


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java:
##
@@ -208,6 +213,13 @@ public static synchronized void resetSpark() {
 }
   }
 
+  @AfterEach
+  public void closeFilesystem() throws IOException {
+if (fileSystem != null) {
+  fileSystem.close();

Review Comment:
   set it to null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] the-other-tim-brown opened a new pull request, #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…

2022-07-24 Thread GitBox


the-other-tim-brown opened a new pull request, #6201:
URL: https://github.com/apache/hudi/pull/6201

   …er test
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193402953

   
   ## CI report:
   
   * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285)
 
   * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193402281

   
   ## CI report:
   
   * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285)
 
   * dbac26f88b14a8df88eba2ca70d566f2db53e412 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193394783

   
   ## CI report:
   
   * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193392924

   
   ## CI report:
   
   * a3cc6e44d568b0f69b1c6b50e91fd6dcddfe5245 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10276)
 
   * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests

2022-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4441:
-
Labels: pull-request-available  (was: )

> Disbale INFO level logs from tests
> --
>
> Key: HUDI-4441
> URL: https://issues.apache.org/jira/browse/HUDI-4441
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging 
> INFO level logs despite the min level set as WARN in all 
> log4j-sure.properties. To reproduce the issue just run any test locally and 
> you should see INFO level logs. This creates unnecessary noise and painful to 
> debug failures. We need to fix this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-07-24 Thread GitBox


hudi-bot commented on PR #6170:
URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193392307

   
   ## CI report:
   
   * a3cc6e44d568b0f69b1c6b50e91fd6dcddfe5245 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10276)
 
   * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193373310

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * e5c73240ef14486c14af348269616a1846b487a9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


hudi-bot commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193372703

   
   ## CI report:
   
   * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN
   * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN
   * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN
   * e5c73240ef14486c14af348269616a1846b487a9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] CTTY commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


CTTY commented on PR #5943:
URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193372326

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] CTTY commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


CTTY commented on code in PR #5943:
URL: https://github.com/apache/hudi/pull/5943#discussion_r928297354


##
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala:
##
@@ -0,0 +1,3351 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.parser
+
+import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode}
+import org.antlr.v4.runtime.{ParserRuleContext, Token}
+import org.apache.hudi.spark.sql.parser.HoodieSqlBaseParser._
+import org.apache.hudi.spark.sql.parser.{HoodieSqlBaseBaseVisitor, 
HoodieSqlBaseParser}
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogStorageFormat}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last}
+import org.apache.spark.sql.catalyst.parser.ParserUtils.{EnhancedLogicalPlan, 
checkDuplicateClauses, checkDuplicateKeys, entry, escapedIdentifier, 
operationNotAllowed, source, string, stringWithoutUnescape, validate, 
withOrigin}
+import org.apache.spark.sql.catalyst.parser.{ParseException, ParserInterface}
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.util.DateTimeUtils._
+import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, 
IntervalUtils, truncatedString}
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.BucketSpecHelper
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition
+import org.apache.spark.sql.connector.expressions.{ApplyTransform, 
BucketTransform, DaysTransform, FieldReference, HoursTransform, 
IdentityTransform, LiteralValue, MonthsTransform, Transform, YearsTransform, 
Expression => V2Expression}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
+import org.apache.spark.util.Utils.isTesting
+import org.apache.spark.util.random.RandomSampler
+
+import java.util.Locale
+import java.util.concurrent.TimeUnit
+import javax.xml.bind.DatatypeConverter
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+/**
+ * The AstBuilder for HoodieSqlParser to parser the AST tree to Logical Plan.
+ * Here we only do the parser for the extended sql syntax. e.g MergeInto. For
+ * other sql syntax we use the delegate sql parser which is the SparkSqlParser.
+ */
+class HoodieSpark3_3ExtendedSqlAstBuilder(conf: SQLConf, delegate: 
ParserInterface)
+  extends HoodieSqlBaseBaseVisitor[AnyRef] with Logging {
+
+  protected def typedVisit[T](ctx: ParseTree): T = {
+ctx.accept(this).asInstanceOf[T]
+  }
+
+  /**
+   * Override the default behavior for all visit methods. This will only 
return a non-null result
+   * when the context has only one child. This is done because there is no 
generic method to
+   * combine the results of the context children. In all other cases null is 
returned.
+   */
+  override def visitChildren(node: RuleNode): AnyRef = {
+if (node.getChildCount == 1) {
+  node.getChild(0).accept(this)
+} else {
+  null
+}
+  }
+
+  /**
+   * Create an aliased table reference. This is typically used in FROM clauses.
+   */
+  override def visitTableName(ctx: TableNameContext): LogicalPlan = 
withOrigin(ctx) {
+val tableId = visitMultipartIdentifier(ctx.multipartIdentifier())
+val relation = UnresolvedRelation(tableId)
+val table = mayApplyAliasPlan(
+  ctx.tableAlias, relation.optionalMap(ctx.temporalClause)(withTimeTravel))
+table.optionalMap(ctx.sample)(withSample)
+  }
+
+  private def withTimeTravel(
+  ctx: TemporalClauseContext, plan: LogicalPlan): 
LogicalPlan = withOrigin(ctx) {

Review Comment:
   Same as above. We can file another PR to fix all those logics later



-- 
This is an a

[GitHub] [hudi] CTTY commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0

2022-07-24 Thread GitBox


CTTY commented on code in PR #5943:
URL: https://github.com/apache/hudi/pull/5943#discussion_r928297202


##
hudi-spark-datasource/hudi-spark3.3.x/src/main/antlr4/imports/SqlBase.g4:
##
@@ -0,0 +1,1908 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's 
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar.
+ */
+
+// The parser file is forked from spark 3.2.0's SqlBase.g4.

Review Comment:
   Those .g4 files have been refactored and changed a lot in Spark 3.3. e.g.: 
https://github.com/apache/spark/pull/35701
   And I don't think it's needed to port those changes back to Hudi as they are 
going to be removed soon. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6200: [HUDI-3510] Add sync validate procedure

2022-07-24 Thread GitBox


hudi-bot commented on PR #6200:
URL: https://github.com/apache/hudi/pull/6200#issuecomment-1193337289

   
   ## CI report:
   
   * dd1e2d2ae53c9ecb8333ae73b0a6d63f55393b86 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10283)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >