[GitHub] [hudi] danny0405 commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.
danny0405 commented on code in PR #6093: URL: https://github.com/apache/hudi/pull/6093#discussion_r927388993 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, hoodieRecordDataStream); // compaction if (OptionsResolver.needsAsyncCompaction(conf)) { +// batch mode write must use syncCompaction. +if (context.isBounded()) { + conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false); Review Comment: In streaming exec mode, bounded source would also trigger checkpoints, should we disable the async compaction for them ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.
danny0405 commented on code in PR #6093: URL: https://github.com/apache/hudi/pull/6093#discussion_r928524041 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, hoodieRecordDataStream); // compaction if (OptionsResolver.needsAsyncCompaction(conf)) { +// batch mode write must use syncCompaction. +if (context.isBounded()) { + conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false); Review Comment: Not exactly, because the bounded source can also be long running. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.
danny0405 commented on PR #5629: URL: https://github.com/apache/hudi/pull/5629#issuecomment-1193649030 > Not very persuaded by the improvement number: read 33% and write 9%, if the number is real and can be re-productive, i would suggest to lower priority of the patch, for example, after release 1.0.0. I had expected about 5x ~ 10x performance improvement, BTW. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.
wzx140 commented on code in PR #5629: URL: https://github.com/apache/hudi/pull/5629#discussion_r928518937 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java: ## @@ -21,64 +21,46 @@ import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.util.Option; -import org.apache.hudi.common.util.SpillableMapUtils; import org.apache.hudi.common.util.collection.Pair; -import org.apache.hudi.config.HoodiePayloadConfig; import org.apache.hudi.exception.HoodieIOException; -import org.apache.hudi.io.storage.HoodieAvroFileReader; +import org.apache.hudi.io.storage.HoodieFileReader; import org.apache.avro.Schema; -import org.apache.avro.generic.GenericRecord; import java.io.IOException; import java.util.Iterator; +import java.util.Properties; import java.util.stream.StreamSupport; /** * Reads records from base file and merges any updates from log files and provides iterable over all records in the file slice. */ public class HoodieFileSliceReader implements Iterator> { + private final Iterator> recordsIterator; public static HoodieFileSliceReader getFileSliceReader( - Option baseFileReader, HoodieMergedLogRecordScanner scanner, Schema schema, String payloadClass, - String preCombineField, Option> simpleKeyGenFieldsOpt) throws IOException { + Option baseFileReader, HoodieMergedLogRecordScanner scanner, Schema schema, Properties props, Option> simpleKeyGenFieldsOpt) throws IOException { if (baseFileReader.isPresent()) { - Iterator baseIterator = baseFileReader.get().getRecordIterator(schema); + Iterator baseIterator = baseFileReader.get().getRecordIterator(schema); while (baseIterator.hasNext()) { -GenericRecord record = (GenericRecord) baseIterator.next(); -HoodieRecord hoodieRecord = transform( -record, scanner, payloadClass, preCombineField, simpleKeyGenFieldsOpt); -scanner.processNextRecord(hoodieRecord); +scanner.processNextRecord(baseIterator.next().expansion(props, simpleKeyGenFieldsOpt, +scanner.isWithOperationField(), scanner.getPartitionName(), false)); } return new HoodieFileSliceReader(scanner.iterator()); } else { Iterable iterable = () -> scanner.iterator(); - HoodiePayloadConfig payloadConfig = HoodiePayloadConfig.newBuilder().withPayloadOrderingField(preCombineField).build(); return new HoodieFileSliceReader(StreamSupport.stream(iterable.spliterator(), false) .map(e -> { try { - GenericRecord record = (GenericRecord) e.toIndexedRecord(schema, payloadConfig.getProps()).get(); - return transform(record, scanner, payloadClass, preCombineField, simpleKeyGenFieldsOpt); + return e.expansion(props, simpleKeyGenFieldsOpt, scanner.isWithOperationField(), scanner.getPartitionName(), false); Review Comment: I looked at this carefully and found that expansion func is not unnecessary here. I also change the func names. expansion -> getKeyWithParams and transform -> getKeyWithKeyGen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4458) Add a converter cache for flink ColumnStatsIndices
[ https://issues.apache.org/jira/browse/HUDI-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4458: - Labels: pull-request-available (was: ) > Add a converter cache for flink ColumnStatsIndices > -- > > Key: HUDI-4458 > URL: https://issues.apache.org/jira/browse/HUDI-4458 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #6205: [HUDI-4458] Add a converter cache for flink ColumnStatsIndices
danny0405 opened a new pull request, #6205: URL: https://github.com/apache/hudi/pull/6205 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4459) Corrupt parquet file created when syncing huge table with 4000+ fields,using hudi cow table with bulk_insert type
Leo zhang created HUDI-4459: --- Summary: Corrupt parquet file created when syncing huge table with 4000+ fields,using hudi cow table with bulk_insert type Key: HUDI-4459 URL: https://issues.apache.org/jira/browse/HUDI-4459 Project: Apache Hudi Issue Type: Bug Reporter: Leo zhang Attachments: statements.sql, table.ddl I am trying to sync a huge table with 4000+ fields into hudi, using cow table with bulk_insert operate type. The job can finished without any exception,but when I am trying to read data from the table,I get empty result.The parquet file is corrupted, can't be read correctly. I had tried to trace the problem, and found it was coused by SortOperator. After the record is serialized in the sorter, all the field get disorder and is deserialized into one field.And finally the wrong record is written into parquet file,and make the file unreadable. Here's a few step to reproduce the bug ine the flink sql-client: 1、execute the table ddl(provided in the table.ddl file in the attachments) 2、execute the insert statement (provided in the statement.sql file in the attachments) 3、execute a select statement to query hudi table (provided in the statement.sql file in the attachments) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4458) Add a converter cache for flink ColumnStatsIndices
Danny Chen created HUDI-4458: Summary: Add a converter cache for flink ColumnStatsIndices Key: HUDI-4458 URL: https://issues.apache.org/jira/browse/HUDI-4458 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Danny Chen Fix For: 0.12.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources
hudi-bot commented on PR #6203: URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193617884 ## CI report: * 745324e449ab6c81eabd274bfbb15a8d5fb3918e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10300) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193617853 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 45a5851255b57276491a3a8914783fefdc5563cc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10295) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193617474 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296) * c015e22540af7ea164c1216874e37202b8cae10e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources
hudi-bot commented on PR #6203: URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193614845 ## CI report: * 745324e449ab6c81eabd274bfbb15a8d5fb3918e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources
hudi-bot commented on PR #6203: URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193612560 ## CI report: * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10291) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193609887 ## CI report: * 16ff6fba9e82e35bfb202902f22e6c59ade998ff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10298) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new f6e7227ed5 [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199) f6e7227ed5 is described below commit f6e7227ed548ea5bac66e224df42e2985fb814a9 Author: Y Ethan Guo AuthorDate: Sun Jul 24 22:08:33 2022 -0700 [MINOR] Only log stdout output for non-zero exit from commands in IT (#6199) --- hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java index 8115d50a78..dcb6367802 100644 --- a/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java +++ b/hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java @@ -236,7 +236,9 @@ public abstract class ITTestBase { int exitCode = dockerClient.inspectExecCmd(createCmdResponse.getId()).exec().getExitCode(); LOG.info("Exit code for command : " + exitCode); -LOG.error("\n\n ## Stdout ###\n" + callback.getStdout().toString()); +if (exitCode != 0) { + LOG.error("\n\n ## Stdout ###\n" + callback.getStdout().toString()); +} LOG.error("\n\n ## Stderr ###\n" + callback.getStderr().toString()); if (checkIfSucceed) {
[GitHub] [hudi] xushiyan merged pull request #6199: [MINOR] Only log stdout output for non-zero exit from commands in IT
xushiyan merged PR #6199: URL: https://github.com/apache/hudi/pull/6199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6199: [MINOR] Only log stdout output for non-zero exit from commands in IT
xushiyan commented on PR #6199: URL: https://github.com/apache/hudi/pull/6199#issuecomment-1193583216 https://issues.apache.org/jira/browse/HUDI-4457 @yihua we can follow up on this. will land this. (CI failure is due to irrelevant flakiness) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4457) Make sure IT docker test return code non-zero when failed
[ https://issues.apache.org/jira/browse/HUDI-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4457: - Description: IT testcase where docker command runs and returns exit code 0, but test actually failed. This will be misleading for troubleshooting. TODO 1. verify the behavior 2. fix it > Make sure IT docker test return code non-zero when failed > - > > Key: HUDI-4457 > URL: https://issues.apache.org/jira/browse/HUDI-4457 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: Raymond Xu >Priority: Major > > IT testcase where docker command runs and returns exit code 0, but test > actually failed. This will be misleading for troubleshooting. > TODO > 1. verify the behavior > 2. fix it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4457) Make sure IT docker test return code non-zero when failed
Raymond Xu created HUDI-4457: Summary: Make sure IT docker test return code non-zero when failed Key: HUDI-4457 URL: https://issues.apache.org/jira/browse/HUDI-4457 Project: Apache Hudi Issue Type: Bug Components: tests-ci Reporter: Raymond Xu -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193581179 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193577362 ## CI report: * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294) * 16ff6fba9e82e35bfb202902f22e6c59ade998ff Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10298) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193577152 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287) * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10296) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193575406 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 8ef79398f29f16623e470320af4db1a113d14dab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290) * 45a5851255b57276491a3a8914783fefdc5563cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10295) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193575153 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287) * 9eece632cdd0f0c55fc81742586d8ef3ecbb769a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193573133 ## CI report: * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294) * 16ff6fba9e82e35bfb202902f22e6c59ade998ff UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193573211 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 8ef79398f29f16623e470320af4db1a113d14dab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290) * 45a5851255b57276491a3a8914783fefdc5563cc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 76a28daeb0 [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201) 76a28daeb0 is described below commit 76a28daeb08e7192d75dfc447624c827643bef0d Author: Tim Brown AuthorDate: Sun Jul 24 21:42:15 2022 -0700 [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness (#6201) --- .../hudi/testutils/SparkClientFunctionalTestHarness.java | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java index f9676c6c47..c58dd178dc 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java @@ -67,6 +67,7 @@ import org.apache.spark.sql.Row; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.SparkSession; import org.junit.jupiter.api.AfterAll; +import org.junit.jupiter.api.AfterEach; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.io.TempDir; @@ -96,6 +97,7 @@ public class SparkClientFunctionalTestHarness implements SparkProvider, HoodieMe private static transient JavaSparkContext jsc; private static transient HoodieSparkEngineContext context; private static transient TimelineService timelineService; + private FileSystem fileSystem; /** * An indicator of the initialization status. @@ -128,7 +130,10 @@ public class SparkClientFunctionalTestHarness implements SparkProvider, HoodieMe } public FileSystem fs() { -return FSUtils.getFs(basePath(), hadoopConf()); +if (fileSystem == null) { + fileSystem = FSUtils.getFs(basePath(), hadoopConf()); +} +return fileSystem; } @Override @@ -208,6 +213,14 @@ public class SparkClientFunctionalTestHarness implements SparkProvider, HoodieMe } } + @AfterEach + public void closeFileSystem() throws IOException { +if (fileSystem != null) { + fileSystem.close(); + fileSystem = null; +} + } + protected JavaRDD tagLocation( HoodieIndex index, JavaRDD records, HoodieTable table) { return HoodieJavaRDD.getJavaRDD(
[GitHub] [hudi] xushiyan merged pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
xushiyan merged PR #6201: URL: https://github.com/apache/hudi/pull/6201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Fix typos in Spark client related classes (#6204)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2a08a65f71 [MINOR] Fix typos in Spark client related classes (#6204) 2a08a65f71 is described below commit 2a08a65f719b5c155dde85a0dc318af5033c31d5 Author: Vander <30547463+vande...@users.noreply.github.com> AuthorDate: Mon Jul 25 12:41:42 2022 +0800 [MINOR] Fix typos in Spark client related classes (#6204) --- .../clustering/run/strategy/SingleSparkJobExecutionStrategy.java| 2 +- .../org/apache/hudi/client/utils/SparkInternalSchemaConverter.java | 4 ++-- .../main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java | 2 +- .../org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java index 1158d0ada4..bb6d3df5f1 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SingleSparkJobExecutionStrategy.java @@ -136,7 +136,7 @@ public abstract class SingleSparkJobExecutionStrategy> performClusteringWithRecordsIterator(final Iterator> records, final int numOutputGroups, final String instantTime, diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java index 8e086c2927..098870a60a 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkInternalSchemaConverter.java @@ -81,7 +81,7 @@ public class SparkInternalSchemaConverter { public static final String HOODIE_VALID_COMMITS_LIST = "hoodie.valid.commits.list"; /** - * Converts a spark schema to an hudi internal schema. Fields without IDs are kept and assigned fallback IDs. + * Convert a spark schema to an hudi internal schema. Fields without IDs are kept and assigned fallback IDs. * * @param sparkSchema a spark schema * @return a matching internal schema for the provided spark schema @@ -157,7 +157,7 @@ public class SparkInternalSchemaConverter { } /** - * Converts Spark schema to Hudi internal schema, and prune fields. + * Convert Spark schema to Hudi internal schema, and prune fields. * Fields without IDs are kept and assigned fallback IDs. * * @param sparkSchema a pruned spark schema diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java index fd083f2c89..a6d03eae2b 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkValidatorUtils.java @@ -50,7 +50,7 @@ import java.util.stream.Stream; import scala.collection.JavaConverters; /** - * Spark validator utils to verify and run any precommit validators configured. + * Spark validator utils to verify and run any pre-commit validators configured. */ public class SparkValidatorUtils { private static final Logger LOG = LogManager.getLogger(BaseSparkCommitActionExecutor.class); diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java index 491c6700c9..9e74d14c04 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java @@ -308,7 +308,7 @@ public class HoodieAvroDataBlock extends HoodieDataBlock { ByteArrayOutputStream baos = new ByteArrayOutputStream(); DataOutputStream output = new DataOutputStream(baos); -// 2. Compress and Write schema out +// 1. Compress and Write schema out byte[] schemaContent = compress(schema.toString()); output.writeInt(schemaContent.length); output.write(schemaContent); @@ -318,10 +318,10 @@ public class HoodieAvroDataBlock extends HoodieDataBlock { recordItr.forEachRemaining(records::add); } -// 3. Write total number of
[GitHub] [hudi] xushiyan merged pull request #6204: [MINOR] Fix typos in Spark client related classes
xushiyan merged PR #6204: URL: https://github.com/apache/hudi/pull/6204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193570330 ## CI report: * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193570427 ## CI report: * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown closed pull request #6190: a simple test
the-other-tim-brown closed pull request #6190: a simple test URL: https://github.com/apache/hudi/pull/6190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vanderzh commented on a diff in pull request #6204: Fix typos in Spark client related classes
vanderzh commented on code in PR #6204: URL: https://github.com/apache/hudi/pull/6204#discussion_r928459060 ## .idea/vcs.xml: ## @@ -1,36 +1,6 @@ -
[GitHub] [hudi] xushiyan closed pull request #5643: [HUDI-4071] Change defaults for some of the configs
xushiyan closed pull request #5643: [HUDI-4071] Change defaults for some of the configs URL: https://github.com/apache/hudi/pull/5643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6204: Fix typos in Spark client related classes
xushiyan commented on code in PR #6204: URL: https://github.com/apache/hudi/pull/6204#discussion_r928457355 ## .idea/vcs.xml: ## @@ -1,36 +1,6 @@ -
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193540808 ## CI report: * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286) * 34485e3a7df2712077f5987f930b7a6fa33a3986 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10294) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193533627 ## CI report: * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286) * 34485e3a7df2712077f5987f930b7a6fa33a3986 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6204: Fix typos in Spark client related classes
hudi-bot commented on PR #6204: URL: https://github.com/apache/hudi/pull/6204#issuecomment-1193526000 ## CI report: * 8bee5ca11e11c53a2100097c8106bbff9aaf5871 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10293) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6204: Fix typos in Spark client related classes
hudi-bot commented on PR #6204: URL: https://github.com/apache/hudi/pull/6204#issuecomment-1193520249 ## CI report: * 8bee5ca11e11c53a2100097c8106bbff9aaf5871 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193520209 ## CI report: * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193518013 ## CI report: * 36cc806477cb75f8c168ce0420849886ab5e650f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vanderzh opened a new pull request, #6204: Fix typos in Spark client related classes
vanderzh opened a new pull request, #6204: URL: https://github.com/apache/hudi/pull/6204 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request This PR fixes a few typos in Spark client related classes. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent
codope commented on PR #6098: URL: https://github.com/apache/hudi/pull/6098#issuecomment-1193500450 > I did not fully understand the bulk insert row writing part. But Can we get it fixed in 0.12 please Yes that's gonna be in 0.12. It's in #6099 but stacked on top of this one. I will decouple the two. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a diff in pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent
vinothchandar commented on code in PR #6098: URL: https://github.com/apache/hudi/pull/6098#discussion_r928369447 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -84,20 +96,62 @@ class HoodieStreamingSink(sqlContext: SQLContext, var updatedOptions = options.updated(HoodieWriteConfig.MARKERS_TYPE.key(), MarkerType.DIRECT.name()) // we need auto adjustment enabled for streaming sink since async table services are feasible within the same JVM. updatedOptions = updatedOptions.updated(HoodieWriteConfig.AUTO_ADJUST_LOCK_CONFIGS.key, "true") +// disable row writer bulk insert of write stream +if (options.getOrDefault(OPERATION.key, UPSERT_OPERATION_OPT_VAL).equalsIgnoreCase(BULK_INSERT_OPERATION_OPT_VAL)) { Review Comment: Row writing is a top priority no? Love to understand this more. ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -247,4 +285,18 @@ class HoodieStreamingSink(sqlContext: SQLContext, writeClient = Option.empty } } + + private def canSkipBatch(batchId: Long): Boolean = { +// get the latest checkpoint from the commit metadata to check if the microbatch has already been prcessed or not +val lastCommit = metaClient.get.getActiveTimeline.getCommitsTimeline.filterCompletedInstants().lastInstant() +if (lastCommit.isPresent) { + val commitMetadata = HoodieCommitMetadata.fromBytes( + metaClient.get.getActiveTimeline.getInstantDetails(lastCommit.get()).get(), classOf[HoodieCommitMetadata]) + val lastCheckpoint = commitMetadata.getMetadata(SinkCheckpointKey) + if (!StringUtils.isNullOrEmpty(lastCheckpoint)) { +latestBatchId = lastCheckpoint.toLong + } +} +latestBatchId >= batchId Review Comment: +1 Might be good to make the data model support multiple values from day 1 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -48,12 +50,24 @@ class HoodieStreamingSink(sqlContext: SQLContext, private val log = LogManager.getLogger(classOf[HoodieStreamingSink]) - private val retryCnt = options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_CNT.key, -DataSourceWriteOptions.STREAMING_RETRY_CNT.defaultValue).toInt - private val retryIntervalMs = options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.key, -DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong - private val ignoreFailedBatch = options.getOrDefault(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.key, - DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean + private val tablePath = new Path(options.getOrElse("path", "Missing 'path' option")) + private var metaClient: Option[HoodieTableMetaClient] = { +try { + Some(HoodieTableMetaClient.builder().setConf(sqlContext.sparkContext.hadoopConfiguration).setBasePath(tablePath.toString).build()) +} catch { + case _: TableNotFoundException => +log.warn("Ignore TableNotFoundException as it is first microbatch.") +Option.empty +} + } + private val retryCnt = options.getOrDefault(STREAMING_RETRY_CNT.key, +STREAMING_RETRY_CNT.defaultValue).toInt + private val retryIntervalMs = options.getOrDefault(STREAMING_RETRY_INTERVAL_MS.key, +STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong + private val ignoreFailedBatch = options.getOrDefault(STREAMING_IGNORE_FAILED_BATCH.key, Review Comment: TBH I think we should make it fail by default and not ignore. Original author from Apple wanted itthat way for them. But probably does not make sense at this point anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] boneanxs commented on a diff in pull request #6028: [HUDI-4355] Bulk insert As Row: Should also repartiiton records if populateMetaFields is false
boneanxs commented on code in PR #6028: URL: https://github.com/apache/hudi/pull/6028#discussion_r928368597 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -523,17 +523,19 @@ object HoodieSparkSqlWriter { val params: mutable.Map[String, String] = collection.mutable.Map(parameters.toSeq: _*) params(HoodieWriteConfig.AVRO_SCHEMA_STRING.key) = schema.toString val writeConfig = DataSourceUtils.createHoodieConfig(schema.toString, path, tblName, mapAsJavaMap(params)) -val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = if (populateMetaFields) { +val bulkInsertPartitionerRows: BulkInsertPartitioner[Dataset[Row]] = { val userDefinedBulkInsertPartitionerOpt = DataSourceUtils.createUserDefinedBulkInsertPartitionerWithRows(writeConfig) Review Comment: Whether we should have a new method in `partitioner` to validate columns meet requirement(like return mandatoryFields, and we use it to check)? Currently if users set user-defined partitioner which acquire metafields, we will also accept it and not throw errors... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a diff in pull request #6098: [HUDI-4389] Make HoodieStreamingSink idempotent
vinothchandar commented on code in PR #6098: URL: https://github.com/apache/hudi/pull/6098#discussion_r928368463 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -48,12 +50,24 @@ class HoodieStreamingSink(sqlContext: SQLContext, private val log = LogManager.getLogger(classOf[HoodieStreamingSink]) - private val retryCnt = options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_CNT.key, -DataSourceWriteOptions.STREAMING_RETRY_CNT.defaultValue).toInt - private val retryIntervalMs = options.getOrDefault(DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.key, -DataSourceWriteOptions.STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong - private val ignoreFailedBatch = options.getOrDefault(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.key, - DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean + private val tablePath = new Path(options.getOrElse("path", "Missing 'path' option")) + private var metaClient: Option[HoodieTableMetaClient] = { +try { + Some(HoodieTableMetaClient.builder().setConf(sqlContext.sparkContext.hadoopConfiguration).setBasePath(tablePath.toString).build()) +} catch { + case _: TableNotFoundException => +log.warn("Ignore TableNotFoundException as it is first microbatch.") +Option.empty +} + } + private val retryCnt = options.getOrDefault(STREAMING_RETRY_CNT.key, +STREAMING_RETRY_CNT.defaultValue).toInt + private val retryIntervalMs = options.getOrDefault(STREAMING_RETRY_INTERVAL_MS.key, +STREAMING_RETRY_INTERVAL_MS.defaultValue).toLong + private val ignoreFailedBatch = options.getOrDefault(STREAMING_IGNORE_FAILED_BATCH.key, +STREAMING_IGNORE_FAILED_BATCH.defaultValue).toBoolean + // This constant serves as the checkpoint key for streaming sink so that each microbatch is processed exactly-once. + private val SinkCheckpointKey = "_streaming_sink_checkpoint" Review Comment: Add a " _ hudi " prefixto the key? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] LinMingQiang commented on a diff in pull request #6093: [HUDI-4385] Support to trigger the compaction in the flink batch mode write.
LinMingQiang commented on code in PR #6093: URL: https://github.com/apache/hudi/pull/6093#discussion_r928363020 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java: ## @@ -95,6 +95,10 @@ public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { pipeline = Pipelines.hoodieStreamWrite(conf, parallelism, hoodieRecordDataStream); // compaction if (OptionsResolver.needsAsyncCompaction(conf)) { +// batch mode write must use syncCompaction. +if (context.isBounded()) { + conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false); Review Comment: My idea is that when the source is bounded, we should not do compaction in checkpoint, because compaction will be done once in `endinput`. Am I right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #5643: [HUDI-4071] Change defaults for some of the configs
danny0405 commented on code in PR #5643: URL: https://github.com/apache/hudi/pull/5643#discussion_r928360330 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -349,7 +349,7 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty EMBEDDED_TIMELINE_SERVER_USE_ASYNC_ENABLE = ConfigProperty .key("hoodie.embed.timeline.server.async") - .defaultValue("false") + .defaultValue("true") .withDocumentation("Controls whether or not, the requests to the timeline server are processed in asynchronous fashion, " Review Comment: 30+ commits is too few to reproduce, in #6179 , we run about 2000+ commits to reproduce the problem. I would suggest you to do the same test before switch the flag. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #5643: [HUDI-4071] Change defaults for some of the configs
danny0405 commented on code in PR #5643: URL: https://github.com/apache/hudi/pull/5643#discussion_r928360330 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -349,7 +349,7 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty EMBEDDED_TIMELINE_SERVER_USE_ASYNC_ENABLE = ConfigProperty .key("hoodie.embed.timeline.server.async") - .defaultValue("false") + .defaultValue("true") .withDocumentation("Controls whether or not, the requests to the timeline server are processed in asynchronous fashion, " Review Comment: 30+ commits is too few to reproduce the problem, in #6179 , we run about 2000+ commits to reproduce the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jiezi2026 commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"
jiezi2026 commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1193472473 We also encountered the same problem with hudi-0.11.1 & spark-3.2.1,and our current temporary method is set hoodie.metadata.enable=false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
leesf commented on code in PR #5943: URL: https://github.com/apache/hudi/pull/5943#discussion_r928353695 ## hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/HoodieSpark32CatalystPlanUtils.scala: ## @@ -13,7 +13,7 @@ * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and - * limitations under the License. + * limitations under the License.a Review Comment: please revert this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
leesf commented on code in PR #5943: URL: https://github.com/apache/hudi/pull/5943#discussion_r928353409 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/adapter/BaseSpark3Adapter.scala: ## @@ -81,23 +80,12 @@ abstract class BaseSpark3Adapter extends SparkAdapter with Logging { } } - override def createExtendedSparkParser: Option[(SparkSession, ParserInterface) => ParserInterface] = { -// since spark3.2.1 support datasourceV2, so we need to a new SqlParser to deal DDL statment -if (SPARK_VERSION.startsWith("3.1")) { - val loadClassName = "org.apache.spark.sql.parser.HoodieSpark312ExtendedSqlParser" - Some { -(spark: SparkSession, delegate: ParserInterface) => { - val clazz = Class.forName(loadClassName, true, Thread.currentThread().getContextClassLoader) - val ctor = clazz.getConstructors.head - ctor.newInstance(spark, delegate).asInstanceOf[ParserInterface] -} - } -} else { - None -} - } - override def createInterpretedPredicate(e: Expression): InterpretedPredicate = { Predicate.createInterpreted(e) } + + override def getQueryParserFromExtendedSqlParser(session: SparkSession, delegate: ParserInterface, Review Comment: can this method defined in `SparkAdapter` and default implement is unsupported ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (a54c963543 -> 1a910fd473)
This is an automated email from the ASF dual-hosted git repository. forwardxu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a54c963543 [HUDI-4348] fix merge into sql data quality in concurrent scene (#6020) add 1a910fd473 [HUDI-3510] Add sync validate procedure (#6200) No new revisions were added by this update. Summary of changes: ...Command.java => HoodieSyncValidateCommand.java} | 2 +- .../hudi/command/procedures/HoodieProcedures.scala | 1 + .../procedures/ValidateHoodieSyncProcedure.scala | 208 + 3 files changed, 210 insertions(+), 1 deletion(-) rename hudi-cli/src/main/java/org/apache/hudi/cli/commands/{HoodieSyncCommand.java => HoodieSyncValidateCommand.java} (98%) create mode 100644 hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ValidateHoodieSyncProcedure.scala
[GitHub] [hudi] XuQianJin-Stars merged pull request #6200: [HUDI-3510] Add sync validate procedure
XuQianJin-Stars merged PR #6200: URL: https://github.com/apache/hudi/pull/6200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193451342 ## CI report: * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193451044 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193431217 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 8ef79398f29f16623e470320af4db1a113d14dab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources
hudi-bot commented on PR #6203: URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193428355 ## CI report: * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10291) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193428343 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 8ef79398f29f16623e470320af4db1a113d14dab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10290) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193426832 ## CI report: * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288) * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN * 36cc806477cb75f8c168ce0420849886ab5e650f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10289) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6203: [HUDI-4456] Clean up test resources
hudi-bot commented on PR #6203: URL: https://github.com/apache/hudi/pull/6203#issuecomment-1193426856 ## CI report: * b98b402fdadec6c219e1d2a50f76e606ecd1ba75 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193426841 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN * 8ef79398f29f16623e470320af4db1a113d14dab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: [HUDI-4456] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193425361 ## CI report: * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288) * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN * 36cc806477cb75f8c168ce0420849886ab5e650f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
hudi-bot commented on PR #6202: URL: https://github.com/apache/hudi/pull/6202#issuecomment-1193425372 ## CI report: * 1a371881a42f251b2080f7adf0830bcaada0e5b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4456) Clean up test resources
[ https://issues.apache.org/jira/browse/HUDI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4456: - Labels: pull-request-available (was: ) > Clean up test resources > --- > > Key: HUDI-4456 > URL: https://issues.apache.org/jira/browse/HUDI-4456 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: Raymond Xu >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan opened a new pull request, #6203: [HUDI-4456] Clean up test resources
xushiyan opened a new pull request, #6203: URL: https://github.com/apache/hudi/pull/6203 Clean up resources from local hdfs cluster and zookeeper cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4456) Clean up test resources
Raymond Xu created HUDI-4456: Summary: Clean up test resources Key: HUDI-4456 URL: https://issues.apache.org/jira/browse/HUDI-4456 Project: Apache Hudi Issue Type: Improvement Components: tests-ci Reporter: Raymond Xu Fix For: 0.12.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4456) Clean up test resources
[ https://issues.apache.org/jira/browse/HUDI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-4456: Assignee: Timothy Brown > Clean up test resources > --- > > Key: HUDI-4456 > URL: https://issues.apache.org/jira/browse/HUDI-4456 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: Raymond Xu >Assignee: Timothy Brown >Priority: Major > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-4441) Disbale INFO level logs from tests
[ https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-4441: --- Assignee: Timothy Brown > Disbale INFO level logs from tests > -- > > Key: HUDI-4441 > URL: https://issues.apache.org/jira/browse/HUDI-4441 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging > INFO level logs despite the min level set as WARN in all > log4j-sure.properties. To reproduce the issue just run any test locally and > you should see INFO level logs. This creates unnecessary noise and painful to > debug failures. We need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6201: [minor] Close FileSystem in SparkClientFunctionalTestHarness
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193415275 ## CI report: * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288) * 4345133281042a0f46f28765b285aca51a430c1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #6201: [minor] Close FileSystem in SparkClientFunctionalTestHarness
the-other-tim-brown commented on code in PR #6201: URL: https://github.com/apache/hudi/pull/6201#discussion_r928325681 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java: ## @@ -208,6 +213,13 @@ public static synchronized void resetSpark() { } } + @AfterEach + public void closeFilesystem() throws IOException { +if (fileSystem != null) { + fileSystem.close(); Review Comment: Updated to set it to null after close -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4455) Improve TestHiveSyncTool and related test classes
[ https://issues.apache.org/jira/browse/HUDI-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4455: - Labels: pull-request-available (was: ) > Improve TestHiveSyncTool and related test classes > - > > Key: HUDI-4455 > URL: https://issues.apache.org/jira/browse/HUDI-4455 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: Raymond Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan opened a new pull request, #6202: [HUDI-4455] Improve test classes for TestHiveSyncTool
xushiyan opened a new pull request, #6202: URL: https://github.com/apache/hudi/pull/6202 Improve HiveTestService, HiveTestUtil, and related classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193413279 ## CI report: * ee5654e47b5c8b837073c2e83464163a25d9dc72 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10288) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193413199 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * e5c73240ef14486c14af348269616a1846b487a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282) * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10287) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-4437) resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool
[ https://issues.apache.org/jira/browse/HUDI-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-4437. Reviewers: Raymond Xu Resolution: Fixed > resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool > --- > > Key: HUDI-4437 > URL: https://issues.apache.org/jira/browse/HUDI-4437 > Project: Apache Hudi > Issue Type: Improvement > Components: meta-sync >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4437) resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool
[ https://issues.apache.org/jira/browse/HUDI-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4437: - Fix Version/s: 0.12.0 > resolve conflicts between TestHiveSyncGlobalCommitTool and TestHiveSyncTool > --- > > Key: HUDI-4437 > URL: https://issues.apache.org/jira/browse/HUDI-4437 > Project: Apache Hudi > Issue Type: Improvement > Components: meta-sync >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…
hudi-bot commented on PR #6201: URL: https://github.com/apache/hudi/pull/6201#issuecomment-1193412428 ## CI report: * ee5654e47b5c8b837073c2e83464163a25d9dc72 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193412308 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * e5c73240ef14486c14af348269616a1846b487a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282) * 193bafdc92afe1e410b5e58ef59ab46fd9fd4fb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4455) Improve TestHiveSyncTool and related test classes
Raymond Xu created HUDI-4455: Summary: Improve TestHiveSyncTool and related test classes Key: HUDI-4455 URL: https://issues.apache.org/jira/browse/HUDI-4455 Project: Apache Hudi Issue Type: Improvement Components: tests-ci Reporter: Raymond Xu Fix For: 0.12.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193411468 ## CI report: * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3822) Fail metadata table validation early for mismatch file slice if timeline has no inflight instant
[ https://issues.apache.org/jira/browse/HUDI-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570566#comment-17570566 ] Raymond Xu commented on HUDI-3822: -- [~guoyihua] not sure if this is resolved. can you confirm pls? > Fail metadata table validation early for mismatch file slice if timeline has > no inflight instant > > > Key: HUDI-3822 > URL: https://issues.apache.org/jira/browse/HUDI-3822 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Bowen Zhu >Priority: Minor > Fix For: 0.12.0 > > > https://github.com/apache/hudi/pull/5234/files/700f80ec372c2a75cf75754f68d6ee2eb0e7fe3b#diff-67533f5d7bf0e672db06b465b914e313cd197ef9a1648f663e1da625df753eac > We can check data table timeline and check if there are any inflights. and if > its committed in MDT and then proceed w/ further checks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-3822) Fail metadata table validation early for mismatch file slice if timeline has no inflight instant
[ https://issues.apache.org/jira/browse/HUDI-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3822: Assignee: Bowen Zhu > Fail metadata table validation early for mismatch file slice if timeline has > no inflight instant > > > Key: HUDI-3822 > URL: https://issues.apache.org/jira/browse/HUDI-3822 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Bowen Zhu >Priority: Minor > Fix For: 0.12.0 > > > https://github.com/apache/hudi/pull/5234/files/700f80ec372c2a75cf75754f68d6ee2eb0e7fe3b#diff-67533f5d7bf0e672db06b465b914e313cd197ef9a1648f663e1da625df753eac > We can check data table timeline and check if there are any inflights. and if > its committed in MDT and then proceed w/ further checks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-2118) Avoid checking corrupt log blocks for cloud storage
[ https://issues.apache.org/jira/browse/HUDI-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-2118: Assignee: Bowen Zhu > Avoid checking corrupt log blocks for cloud storage > --- > > Key: HUDI-2118 > URL: https://issues.apache.org/jira/browse/HUDI-2118 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Rajesh Mahindra >Assignee: Bowen Zhu >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on pull request #6197: [HUDI-4071] Match ROLLBACK_USING_MARKERS_ENABLE in sql as datasource
xushiyan commented on PR #6197: URL: https://github.com/apache/hudi/pull/6197#issuecomment-1193409613 @XuQianJin-Stars probably some scenarios in call procedure do not support using marker (i have not dived in the failures myself). if you have time, pls help check this. It should be set for each subclass of base procedure if not supporting marker, instead of at the base level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…
xushiyan commented on code in PR #6201: URL: https://github.com/apache/hudi/pull/6201#discussion_r928321553 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java: ## @@ -208,6 +213,13 @@ public static synchronized void resetSpark() { } } + @AfterEach + public void closeFilesystem() throws IOException { +if (fileSystem != null) { + fileSystem.close(); Review Comment: set it to null? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown opened a new pull request, #6201: reuse FileSystem in SparkClientFunctionalTestHarness and close it aft…
the-other-tim-brown opened a new pull request, #6201: URL: https://github.com/apache/hudi/pull/6201 …er test ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193402953 ## CI report: * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285) * dbac26f88b14a8df88eba2ca70d566f2db53e412 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10286) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193402281 ## CI report: * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285) * dbac26f88b14a8df88eba2ca70d566f2db53e412 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193394783 ## CI report: * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193392924 ## CI report: * a3cc6e44d568b0f69b1c6b50e91fd6dcddfe5245 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10276) * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10285) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4441) Disbale INFO level logs from tests
[ https://issues.apache.org/jira/browse/HUDI-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4441: - Labels: pull-request-available (was: ) > Disbale INFO level logs from tests > -- > > Key: HUDI-4441 > URL: https://issues.apache.org/jira/browse/HUDI-4441 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > > Since the log4j1-2 bridge upgrade, we have noticed that CI runs are logging > INFO level logs despite the min level set as WARN in all > log4j-sure.properties. To reproduce the issue just run any test locally and > you should see INFO level logs. This creates unnecessary noise and painful to > debug failures. We need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies
hudi-bot commented on PR #6170: URL: https://github.com/apache/hudi/pull/6170#issuecomment-1193392307 ## CI report: * a3cc6e44d568b0f69b1c6b50e91fd6dcddfe5245 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10276) * 1a2d20c64958d09d8c9407e32cdb892ee4669d1b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193373310 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * e5c73240ef14486c14af348269616a1846b487a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
hudi-bot commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193372703 ## CI report: * fa048b175c2b3b5a80c6ef8d0b9709097b822cfb UNKNOWN * b94604147edcfc5040b6cf8a1a649e9a0cf1eb2a UNKNOWN * 0fdc1347c43459f3946b27cdf6753e3166ea6055 UNKNOWN * e5c73240ef14486c14af348269616a1846b487a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10282) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] CTTY commented on pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
CTTY commented on PR #5943: URL: https://github.com/apache/hudi/pull/5943#issuecomment-1193372326 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] CTTY commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
CTTY commented on code in PR #5943: URL: https://github.com/apache/hudi/pull/5943#discussion_r928297354 ## hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala: ## @@ -0,0 +1,3351 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.parser + +import org.antlr.v4.runtime.tree.{ParseTree, RuleNode, TerminalNode} +import org.antlr.v4.runtime.{ParserRuleContext, Token} +import org.apache.hudi.spark.sql.parser.HoodieSqlBaseParser._ +import org.apache.hudi.spark.sql.parser.{HoodieSqlBaseBaseVisitor, HoodieSqlBaseParser} +import org.apache.spark.internal.Logging +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogStorageFormat} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate.{First, Last} +import org.apache.spark.sql.catalyst.parser.ParserUtils.{EnhancedLogicalPlan, checkDuplicateClauses, checkDuplicateKeys, entry, escapedIdentifier, operationNotAllowed, source, string, stringWithoutUnescape, validate, withOrigin} +import org.apache.spark.sql.catalyst.parser.{ParseException, ParserInterface} +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.util.DateTimeUtils._ +import org.apache.spark.sql.catalyst.util.{CharVarcharUtils, DateTimeUtils, IntervalUtils, truncatedString} +import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier} +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.BucketSpecHelper +import org.apache.spark.sql.connector.catalog.TableCatalog +import org.apache.spark.sql.connector.catalog.TableChange.ColumnPosition +import org.apache.spark.sql.connector.expressions.{ApplyTransform, BucketTransform, DaysTransform, FieldReference, HoursTransform, IdentityTransform, LiteralValue, MonthsTransform, Transform, YearsTransform, Expression => V2Expression} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} +import org.apache.spark.util.Utils.isTesting +import org.apache.spark.util.random.RandomSampler + +import java.util.Locale +import java.util.concurrent.TimeUnit +import javax.xml.bind.DatatypeConverter +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +/** + * The AstBuilder for HoodieSqlParser to parser the AST tree to Logical Plan. + * Here we only do the parser for the extended sql syntax. e.g MergeInto. For + * other sql syntax we use the delegate sql parser which is the SparkSqlParser. + */ +class HoodieSpark3_3ExtendedSqlAstBuilder(conf: SQLConf, delegate: ParserInterface) + extends HoodieSqlBaseBaseVisitor[AnyRef] with Logging { + + protected def typedVisit[T](ctx: ParseTree): T = { +ctx.accept(this).asInstanceOf[T] + } + + /** + * Override the default behavior for all visit methods. This will only return a non-null result + * when the context has only one child. This is done because there is no generic method to + * combine the results of the context children. In all other cases null is returned. + */ + override def visitChildren(node: RuleNode): AnyRef = { +if (node.getChildCount == 1) { + node.getChild(0).accept(this) +} else { + null +} + } + + /** + * Create an aliased table reference. This is typically used in FROM clauses. + */ + override def visitTableName(ctx: TableNameContext): LogicalPlan = withOrigin(ctx) { +val tableId = visitMultipartIdentifier(ctx.multipartIdentifier()) +val relation = UnresolvedRelation(tableId) +val table = mayApplyAliasPlan( + ctx.tableAlias, relation.optionalMap(ctx.temporalClause)(withTimeTravel)) +table.optionalMap(ctx.sample)(withSample) + } + + private def withTimeTravel( + ctx: TemporalClauseContext, plan: LogicalPlan): LogicalPlan = withOrigin(ctx) { Review Comment: Same as above. We can file another PR to fix all those logics later -- This is an a
[GitHub] [hudi] CTTY commented on a diff in pull request #5943: [HUDI-4186] Support Hudi with Spark 3.3.0
CTTY commented on code in PR #5943: URL: https://github.com/apache/hudi/pull/5943#discussion_r928297202 ## hudi-spark-datasource/hudi-spark3.3.x/src/main/antlr4/imports/SqlBase.g4: ## @@ -0,0 +1,1908 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4 grammar. + */ + +// The parser file is forked from spark 3.2.0's SqlBase.g4. Review Comment: Those .g4 files have been refactored and changed a lot in Spark 3.3. e.g.: https://github.com/apache/spark/pull/35701 And I don't think it's needed to port those changes back to Hudi as they are going to be removed soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6200: [HUDI-3510] Add sync validate procedure
hudi-bot commented on PR #6200: URL: https://github.com/apache/hudi/pull/6200#issuecomment-1193337289 ## CI report: * dd1e2d2ae53c9ecb8333ae73b0a6d63f55393b86 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10283) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org