[jira] [Assigned] (HUDI-7824) Fix incremental partitions fetch logic when savepoint is removed for Incr cleaner
[ https://issues.apache.org/jira/browse/HUDI-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-7824: - Assignee: sivabalan narayanan > Fix incremental partitions fetch logic when savepoint is removed for Incr > cleaner > - > > Key: HUDI-7824 > URL: https://issues.apache.org/jira/browse/HUDI-7824 > Project: Apache Hudi > Issue Type: Bug > Components: cleaning >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > with incremental cleaner, if a savepoint is blocking cleaning up of a commit > and cleaner moved ahead wrt earliest commit to retain, when savepoint is > removed later, cleaner should account for cleaning up the commit of interest. > > Lets ensure clean planner account for all partitions when such savepoint > removal is detected -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
hudi-bot commented on PR #11375: URL: https://github.com/apache/hudi/pull/11375#issuecomment-2143278909 ## CI report: * 97933909750b810570745044912e9506bcb0acf2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24181) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
hudi-bot commented on PR #11375: URL: https://github.com/apache/hudi/pull/11375#issuecomment-2143275364 ## CI report: * 97933909750b810570745044912e9506bcb0acf2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2143273073 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 6abd40f1b77feb86cdc95d58cd2285c546a1f63e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24180) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
nsivabalan commented on code in PR #11375: URL: https://github.com/apache/hudi/pull/11375#discussion_r1623138103 ## hudi-client/hudi-client-common/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker: ## @@ -0,0 +1 @@ +mock-maker-inline Review Comment: looks like we need this for static mocking to work. Could not get it to work otherwise. https://stackoverflow.com/questions/21105403/mocking-static-methods-with-mockito -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7824) Fix incremental partitions fetch logic when savepoint is removed for Incr cleaner
[ https://issues.apache.org/jira/browse/HUDI-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7824: - Labels: pull-request-available (was: ) > Fix incremental partitions fetch logic when savepoint is removed for Incr > cleaner > - > > Key: HUDI-7824 > URL: https://issues.apache.org/jira/browse/HUDI-7824 > Project: Apache Hudi > Issue Type: Bug > Components: cleaning >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > with incremental cleaner, if a savepoint is blocking cleaning up of a commit > and cleaner moved ahead wrt earliest commit to retain, when savepoint is > removed later, cleaner should account for cleaning up the commit of interest. > > Lets ensure clean planner account for all partitions when such savepoint > removal is detected -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7824] Fixing incr cleaner with savepoint removal [hudi]
nsivabalan opened a new pull request, #11375: URL: https://github.com/apache/hudi/pull/11375 ### Change Logs Whenever a savepoint is removed, cleaner should resort to do entire partition cleaning instead of incr cleaning. We already attempted a fix https://github.com/apache/hudi/pull/10651, but it had a bug where not all partitions were account for. Whenever a savepoint is removed, cleaner should just resort to full partition cleaning. Anyways, savepoint meta files are deleted and savepoint will be tracking every latest base file for every partition, it makes sense to do entire partition list cleaning. ### Impact Whenever a savepoint is removed, cleaner should resort to do entire partition cleaning instead of incr cleaning. ### Risk level (write none, low medium or high below) low. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7824) Fix incremental partitions fetch logic when savepoint is removed for Incr cleaner
sivabalan narayanan created HUDI-7824: - Summary: Fix incremental partitions fetch logic when savepoint is removed for Incr cleaner Key: HUDI-7824 URL: https://issues.apache.org/jira/browse/HUDI-7824 Project: Apache Hudi Issue Type: Bug Components: cleaning Reporter: sivabalan narayanan with incremental cleaner, if a savepoint is blocking cleaning up of a commit and cleaner moved ahead wrt earliest commit to retain, when savepoint is removed later, cleaner should account for cleaning up the commit of interest. Lets ensure clean planner account for all partitions when such savepoint removal is detected -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-5956] Simple repair spark sql dag ui display problem [hudi]
KnightChess commented on code in PR #8233: URL: https://github.com/apache/hudi/pull/8233#discussion_r1623133643 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -123,6 +126,24 @@ object HoodieSparkSqlWriter { streamingWritesParamsOpt: Option[StreamingWriteParams] = Option.empty, hoodieWriteClient: Option[SparkRDDWriteClient[_]] = Option.empty): (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = { +//TODO reuse DataWritingCommand sparkPlan, reduce the number of sql list in SPARK UI SQL tag, rendering raw DAG Review Comment: @codope Sorry to reply late, no overhead, just in the SQL TAB, it doesn't look beautiful. the TODO is hard fix now, because we reproduct logical plan in hudi command plan. I will open new pr to tracking fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2143241323 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 6abd40f1b77feb86cdc95d58cd2285c546a1f63e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24180) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2143219341 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 6abd40f1b77feb86cdc95d58cd2285c546a1f63e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2143216104 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7823) Simplify dependency management on exclusions
[ https://issues.apache.org/jira/browse/HUDI-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7823: - Labels: pull-request-available (was: ) > Simplify dependency management on exclusions > > > Key: HUDI-7823 > URL: https://issues.apache.org/jira/browse/HUDI-7823 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
yihua opened a new pull request, #11374: URL: https://github.com/apache/hudi/pull/11374 ### Change Logs This PR simplifies the dependency management on exclusions by moving the common dependency exclusions to the root POM. ### Impact Simplifies dependency management on exclusions. ### Risk level low ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2143187023 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * ec6fa62945094d548dce7d7e8e6ef2363ba0d05f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24179) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7823) Simplify dependency management on exclusions
Ethan Guo created HUDI-7823: --- Summary: Simplify dependency management on exclusions Key: HUDI-7823 URL: https://issues.apache.org/jira/browse/HUDI-7823 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7819] Fix OptionsResolver#allowCommitOnEmptyBatch default value bug [hudi]
danny0405 commented on PR #11370: URL: https://github.com/apache/hudi/pull/11370#issuecomment-2143140776 And some UT failures: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=24160&view=logs&j=7601efb9-4019-552e-11ba-eb31b66593b2&t=9688f101-287d-53f4-2a80-87202516f5d0&l=17578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]
danny0405 commented on code in PR #11043: URL: https://github.com/apache/hudi/pull/11043#discussion_r1623048400 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BloomFiltersIndexSupport.scala: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi + +import org.apache.hudi.HoodieConversionUtils.toScalaOption +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.model.FileSlice +import org.apache.hudi.common.table.HoodieTableMetaClient +import org.apache.hudi.metadata.HoodieTableMetadataUtil +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.expressions.Expression + +class BloomFiltersIndexSupport(spark: SparkSession, + metadataConfig: HoodieMetadataConfig, + metaClient: HoodieTableMetaClient) extends RecordLevelIndexSupport(spark, metadataConfig, metaClient) { Review Comment: It's just a code reuse right? The RLI has nothing to do with the bloom_filter index query index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]
danny0405 commented on code in PR #11043: URL: https://github.com/apache/hudi/pull/11043#discussion_r1623048159 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestBloomFiltersIndexSupport.scala: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.DataSourceWriteOptions._ +import org.apache.hudi.common.config.{HoodieMetadataConfig, TypedProperties} +import org.apache.hudi.common.model.{FileSlice, HoodieTableType} +import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient} +import org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.metadata.HoodieMetadataFileSystemView +import org.apache.hudi.testutils.HoodieSparkClientTestBase +import org.apache.hudi.util.{JFunction, JavaConversions} +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, HoodieFileIndex} +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, EqualTo, Expression, Literal} +import org.apache.spark.sql.functions.{col, not} +import org.apache.spark.sql.types.StringType +import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertTrue +import org.junit.jupiter.api.{AfterEach, BeforeEach, Test} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +import java.util.concurrent.atomic.AtomicInteger +import java.util.stream.Collectors +import scala.collection.JavaConverters._ +import scala.collection.{JavaConverters, mutable} + +class TestBloomFiltersIndexSupport extends HoodieSparkClientTestBase { + + val sqlTempTable = "hudi_tbl_bloom" + var spark: SparkSession = _ + var instantTime: AtomicInteger = _ + val metadataOpts: Map[String, String] = Map( +HoodieMetadataConfig.ENABLE.key -> "true", +HoodieMetadataConfig.ENABLE_METADATA_INDEX_BLOOM_FILTER.key -> "true", +HoodieMetadataConfig.BLOOM_FILTER_INDEX_FOR_COLUMNS.key -> "_row_key" + ) + val commonOpts: Map[String, String] = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +HoodieWriteConfig.TBL_NAME.key -> "hoodie_test", +RECORDKEY_FIELD.key -> "_row_key", +PARTITIONPATH_FIELD.key -> "partition", +PRECOMBINE_FIELD.key -> "timestamp", +HoodieTableConfig.POPULATE_META_FIELDS.key -> "true" + ) ++ metadataOpts + var mergedDfList: List[DataFrame] = List.empty + + @BeforeEach + override def setUp(): Unit = { +initPath() +initSparkContexts() +initHoodieStorage() +initTestDataGenerator() + +setTableName("hoodie_test") +initMetaClient() + +instantTime = new AtomicInteger(1) + +spark = sqlContext.sparkSession + } + + @AfterEach + override def tearDown(): Unit = { +cleanupFileSystem() +cleanupSparkContexts() + } + + @ParameterizedTest + @EnumSource(classOf[HoodieTableType]) + def testIndexInitialization(tableType: HoodieTableType): Unit = { +val hudiOpts = commonOpts + (DataSourceWriteOptions.TABLE_TYPE.key -> tableType.name()) +doWriteAndValidateBloomFilters( + hudiOpts, + operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, + saveMode = SaveMode.Overwrite) + } + + /** + * Test case to do a write with updates and then validate file pruning using bloom filters. + */ + @Test + def testBloomFiltersIndexFilePruning(): Unit = { +var hudiOpts = commonOpts +hudiOpts = hudiOpts + ( + DataSourceReadOptions.ENABLE_DATA_SKIPPING.key -> "true") + +doWriteAndValidateBloomFilters( + hudiOpts, + operation = DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, + saveMode = SaveMode.Overwrite, + shouldValidate = false) +doWriteAndValidateBloomFilters( + hudiOpts, + operation = DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL, + saveMode = SaveMode.Append) + +createTempTable(hudiOpts) +verifyQueryPredicate(hudiOpts, "_row_key") + } + + private def createTempTable(hudiOpts: Map[String, String]): Unit = { +val readDf = spark.read.format("hudi
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2143101212 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * d504e37ab6cee7d80e53e6daf2df1ef95eea01b7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24169) * ec6fa62945094d548dce7d7e8e6ef2363ba0d05f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24179) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2143089758 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * d504e37ab6cee7d80e53e6daf2df1ef95eea01b7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24169) * ec6fa62945094d548dce7d7e8e6ef2363ba0d05f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]
hudi-bot commented on PR #11364: URL: https://github.com/apache/hudi/pull/11364#issuecomment-2143082242 ## CI report: * 0dc960c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24173) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]
hudi-bot commented on PR #11364: URL: https://github.com/apache/hudi/pull/11364#issuecomment-2143073580 ## CI report: * 0dc960c61eb43e9c1f1e97cf60d772145e1b2c3e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24178) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24173) * 0dc960c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests (#10931)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 16e1adb5b3c [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests (#10931) 16e1adb5b3c is described below commit 16e1adb5b3c8e3601044deec8e880ac15ccb74c8 Author: hehuiyuan <471627...@qq.com> AuthorDate: Sat Jun 1 06:34:51 2024 +0800 [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests (#10931) Co-authored-by: Y Ethan Guo --- .../hudi/table/catalog/TestHoodieCatalog.java | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieCatalog.java b/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieCatalog.java index 98c98bebcce..f6737128698 100644 --- a/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieCatalog.java +++ b/hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/catalog/TestHoodieCatalog.java @@ -28,6 +28,7 @@ import org.apache.hudi.common.testutils.HoodieTestUtils; import org.apache.hudi.common.util.Option; import org.apache.hudi.configuration.FlinkOptions; import org.apache.hudi.configuration.HadoopConfigurations; +import org.apache.hudi.exception.HoodieIOException; import org.apache.hudi.exception.HoodieValidationException; import org.apache.hudi.keygen.ComplexAvroKeyGenerator; import org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator; @@ -66,12 +67,14 @@ import org.apache.flink.table.catalog.exceptions.TableAlreadyExistException; import org.apache.flink.table.catalog.exceptions.TableNotExistException; import org.apache.flink.table.types.DataType; import org.apache.flink.table.types.logical.LogicalTypeRoot; +import org.apache.hadoop.fs.FileSystem; import org.junit.jupiter.api.AfterEach; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; import org.junit.jupiter.api.io.TempDir; import java.io.File; +import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; @@ -173,8 +176,12 @@ public class TestHoodieCatalog { streamTableEnv.getConfig().getConfiguration() .setInteger(ExecutionConfigOptions.TABLE_EXEC_RESOURCE_DEFAULT_PARALLELISM, 2); -File catalogPath = new File(tempFile.getPath()); -catalogPath.mkdir(); +try { + FileSystem fs = FileSystem.get(HadoopConfigurations.getHadoopConf(new Configuration())); + fs.mkdirs(new org.apache.hadoop.fs.Path(tempFile.getPath())); +} catch (IOException e) { + throw new HoodieIOException("Failed to create tempFile dir.", e); +} catalog = new HoodieCatalog("hudi", Configuration.fromMap(getDefaultCatalogOption())); catalog.open(); @@ -266,6 +273,7 @@ public class TestHoodieCatalog { // validate key generator for partitioned table HoodieTableMetaClient metaClient = createMetaClient( +new HadoopStorageConfiguration(HadoopConfigurations.getHadoopConf(new Configuration())), catalog.inferTablePath(catalogPathStr, tablePath)); String keyGeneratorClassName = metaClient.getTableConfig().getKeyGeneratorClassName(); assertEquals(keyGeneratorClassName, SimpleAvroKeyGenerator.class.getName()); @@ -283,6 +291,7 @@ public class TestHoodieCatalog { catalog.createTable(singleKeyMultiplePartitionPath, singleKeyMultiplePartitionTable, false); metaClient = createMetaClient( +new HadoopStorageConfiguration(HadoopConfigurations.getHadoopConf(new Configuration())), catalog.inferTablePath(catalogPathStr, singleKeyMultiplePartitionPath)); keyGeneratorClassName = metaClient.getTableConfig().getKeyGeneratorClassName(); assertThat(keyGeneratorClassName, is(ComplexAvroKeyGenerator.class.getName())); @@ -300,6 +309,7 @@ public class TestHoodieCatalog { catalog.createTable(multipleKeySinglePartitionPath, multipleKeySinglePartitionTable, false); metaClient = createMetaClient( +new HadoopStorageConfiguration(HadoopConfigurations.getHadoopConf(new Configuration())), catalog.inferTablePath(catalogPathStr, singleKeyMultiplePartitionPath)); keyGeneratorClassName = metaClient.getTableConfig().getKeyGeneratorClassName(); assertThat(keyGeneratorClassName, is(ComplexAvroKeyGenerator.class.getName())); @@ -317,7 +327,9 @@ public class TestHoodieCatalog { catalog.createTable(nonPartitionPath, nonPartitionCatalogTable, false); -metaClient = createMetaClient(catalog.inferTablePath(catalogPathStr, nonPartitionPath)); +metaClient = createMetaClient( +new HadoopStorageConfiguration(HadoopConfigurations.getHadoopConf(new Configuration())), +catalog.inferTab
Re: [PR] [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests [hudi]
yihua commented on PR #10931: URL: https://github.com/apache/hudi/pull/10931#issuecomment-2143060956 Azure CI is green. https://github.com/apache/hudi/assets/2497195/c82f3d79-edd6-4ae8-838e-8d760f153ed6";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests [hudi]
yihua merged PR #10931: URL: https://github.com/apache/hudi/pull/10931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Avoid listing files for empty tables (#11155)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new acecf304254 [MINOR] Avoid listing files for empty tables (#11155) acecf304254 is described below commit acecf3042549583de31cad176fb500c55bb61700 Author: Tim Brown AuthorDate: Fri May 31 17:30:14 2024 -0500 [MINOR] Avoid listing files for empty tables (#11155) --- .../hudi/metadata/HoodieBackedTableMetadataWriter.java | 17 - .../hudi/table/action/commit/UpsertPartitioner.java| 18 +++--- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java index 831c2e1882c..604399b7382 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java @@ -83,12 +83,14 @@ import org.slf4j.LoggerFactory; import java.io.FileNotFoundException; import java.io.IOException; +import java.util.ArrayDeque; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; +import java.util.Queue; import java.util.Set; import java.util.function.Function; import java.util.stream.Collectors; @@ -761,7 +763,10 @@ public abstract class HoodieBackedTableMetadataWriter implements HoodieTableM * @return List consisting of {@code DirectoryInfo} for each partition found. */ private List listAllPartitionsFromFilesystem(String initializationTime, Set pendingDataInstants) { -List pathsToList = new LinkedList<>(); +if (dataMetaClient.getActiveTimeline().countInstants() == 0) { + return Collections.emptyList(); +} +Queue pathsToList = new ArrayDeque<>(); pathsToList.add(new StoragePath(dataWriteConfig.getBasePath())); List partitionsToBootstrap = new LinkedList<>(); @@ -773,16 +778,18 @@ public abstract class HoodieBackedTableMetadataWriter implements HoodieTableM while (!pathsToList.isEmpty()) { // In each round we will list a section of directories int numDirsToList = Math.min(fileListingParallelism, pathsToList.size()); + List pathsToProcess = new ArrayList<>(numDirsToList); + for (int i = 0; i < numDirsToList; i++) { +pathsToProcess.add(pathsToList.poll()); + } // List all directories in parallel engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing " + numDirsToList + " partitions from filesystem"); - List processedDirectories = engineContext.map(pathsToList.subList(0, numDirsToList), path -> { + List processedDirectories = engineContext.map(pathsToProcess, path -> { HoodieStorage storage = new HoodieHadoopStorage(path, storageConf); String relativeDirPath = FSUtils.getRelativePartitionPath(storageBasePath, path); return new DirectoryInfo(relativeDirPath, storage.listDirectEntries(path), initializationTime, pendingDataInstants); }, numDirsToList); - pathsToList = new LinkedList<>(pathsToList.subList(numDirsToList, pathsToList.size())); - // If the listing reveals a directory, add it to queue. If the listing reveals a hoodie partition, add it to // the results. for (DirectoryInfo dirInfo : processedDirectories) { @@ -815,10 +822,10 @@ public abstract class HoodieBackedTableMetadataWriter implements HoodieTableM * @return List consisting of {@code DirectoryInfo} for each partition found. */ private List listAllPartitionsFromMDT(String initializationTime, Set pendingDataInstants) throws IOException { -List dirinfoList = new LinkedList<>(); List allPartitionPaths = metadata.getAllPartitionPaths().stream() .map(partitionPath -> dataWriteConfig.getBasePath() + StoragePath.SEPARATOR_CHAR + partitionPath).collect(Collectors.toList()); Map> partitionFileMap = metadata.getAllFilesInPartitions(allPartitionPaths); +List dirinfoList = new ArrayList<>(partitionFileMap.size()); for (Map.Entry> entry : partitionFileMap.entrySet()) { dirinfoList.add(new DirectoryInfo(entry.getKey(), entry.getValue(), initializationTime, pendingDataInstants)); } diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java index 09904cd290e..ea125614170 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java +++ b/hudi-client/h
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
yihua merged PR #11155: URL: https://github.com/apache/hudi/pull/11155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
yihua commented on PR #11155: URL: https://github.com/apache/hudi/pull/11155#issuecomment-2143057842 Azure CI is green. https://github.com/apache/hudi/assets/2497195/189a84c2-9029-43c8-a4f2-e0d93a0d34bc";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
yihua commented on PR #11155: URL: https://github.com/apache/hudi/pull/11155#issuecomment-2143057499 Azure CI is green. https://github.com/apache/hudi/assets/2497195/b2e99712-2aa6-47d9-81a6-6fca43217863";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
hudi-bot commented on PR #11155: URL: https://github.com/apache/hudi/pull/11155#issuecomment-2143035154 ## CI report: * 3062782a8b6b02da35c82a87c4ffa1f061f22dc3 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24177) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24174) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests [hudi]
hudi-bot commented on PR #10931: URL: https://github.com/apache/hudi/pull/10931#issuecomment-2143034731 ## CI report: * 3d87799728e8015152212910997bf7e21ca3a40d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24176) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7821] Handle case where older proto message is read with new schema [hudi]
hudi-bot commented on PR #11373: URL: https://github.com/apache/hudi/pull/11373#issuecomment-2143027374 ## CI report: * 32abc805a2e2d4764215bd7dea93ce72c0532bec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24171) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]
hudi-bot commented on PR #11368: URL: https://github.com/apache/hudi/pull/11368#issuecomment-2143027333 ## CI report: * d945405ab2605efcb2dd86a8fff6f9dc622ae14a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24172) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]
hudi-bot commented on PR #11364: URL: https://github.com/apache/hudi/pull/11364#issuecomment-2143027274 ## CI report: * 0dc960c61eb43e9c1f1e97cf60d772145e1b2c3e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24178) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24173) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests [hudi]
hudi-bot commented on PR #10931: URL: https://github.com/apache/hudi/pull/10931#issuecomment-2142986046 ## CI report: * e09914c58cede10a0b8efb315837e6e9d34b1d95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23048) * 3d87799728e8015152212910997bf7e21ca3a40d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24176) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
hudi-bot commented on PR #11155: URL: https://github.com/apache/hudi/pull/11155#issuecomment-2142986396 ## CI report: * c62bc211274fbe2b31dd8d07d7ede8ecae5f6d64 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24093) * 3062782a8b6b02da35c82a87c4ffa1f061f22dc3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24177) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid listing files for empty tables [hudi]
hudi-bot commented on PR #11155: URL: https://github.com/apache/hudi/pull/11155#issuecomment-2142978738 ## CI report: * c62bc211274fbe2b31dd8d07d7ede8ecae5f6d64 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24093) * 3062782a8b6b02da35c82a87c4ffa1f061f22dc3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7814] Exclude unused transitive dependencies that introduce vulnerabilities [hudi]
hudi-bot commented on PR #11364: URL: https://github.com/apache/hudi/pull/11364#issuecomment-2142979136 ## CI report: * ff1e3d8a934fe1a2c92e341be610516476bf5d7a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24153) * 0dc960c61eb43e9c1f1e97cf60d772145e1b2c3e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7822] Resolve the conflicts between mixed hdfs and local path in Flink tests [hudi]
hudi-bot commented on PR #10931: URL: https://github.com/apache/hudi/pull/10931#issuecomment-2142978283 ## CI report: * e09914c58cede10a0b8efb315837e6e9d34b1d95 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23048) * 3d87799728e8015152212910997bf7e21ca3a40d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7822: - Labels: pull-request-available (was: ) > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]
hudi-bot commented on PR #11368: URL: https://github.com/apache/hudi/pull/11368#issuecomment-2142970542 ## CI report: * 1dde761d4147e9c1a94914759ca0bfd0f7d23ec7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24154) * d945405ab2605efcb2dd86a8fff6f9dc622ae14a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24172) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2142969807 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * d504e37ab6cee7d80e53e6daf2df1ef95eea01b7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24169) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851205#comment-17851205 ] Ethan Guo commented on HUDI-7822: - https://github.com/apache/hudi/pull/10931 > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
Ethan Guo created HUDI-7822: --- Summary: Resolve the conflicts between mixed hdfs and local path in Flink tests Key: HUDI-7822 URL: https://issues.apache.org/jira/browse/HUDI-7822 Project: Apache Hudi Issue Type: Bug Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7822) Resolve the conflicts between mixed hdfs and local path in Flink tests
[ https://issues.apache.org/jira/browse/HUDI-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7822: Fix Version/s: 1.0.0 > Resolve the conflicts between mixed hdfs and local path in Flink tests > -- > > Key: HUDI-7822 > URL: https://issues.apache.org/jira/browse/HUDI-7822 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7821] Handle case where older proto message is read with new schema [hudi]
hudi-bot commented on PR #11373: URL: https://github.com/apache/hudi/pull/11373#issuecomment-2142920593 ## CI report: * 32abc805a2e2d4764215bd7dea93ce72c0532bec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24171) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]
hudi-bot commented on PR #11368: URL: https://github.com/apache/hudi/pull/11368#issuecomment-2142920538 ## CI report: * 1dde761d4147e9c1a94914759ca0bfd0f7d23ec7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24154) * d945405ab2605efcb2dd86a8fff6f9dc622ae14a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7718] Try to fetch the latestSourceProfile in HoodieIncrSource [hudi]
yihua commented on code in PR #11175: URL: https://github.com/apache/hudi/pull/11175#discussion_r1622908331 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestHoodieIncrSource.java: ## @@ -344,7 +385,7 @@ private void readAndAssert(IncrSourceHelper.MissingCheckpointStrategy missingChe snapshotCheckPointImplClassOpt.map(className -> properties.setProperty(SnapshotLoadQuerySplitter.Config.SNAPSHOT_LOAD_QUERY_SPLITTER_CLASS_NAME, className)); TypedProperties typedProperties = new TypedProperties(properties); -HoodieIncrSource incrSource = new HoodieIncrSource(typedProperties, jsc(), spark(), new DummySchemaProvider(HoodieTestDataGenerator.AVRO_SCHEMA)); +HoodieIncrSource incrSource = new HoodieIncrSource(typedProperties, jsc(), spark(), metrics, new DefaultStreamContext(new DummySchemaProvider(HoodieTestDataGenerator.AVRO_SCHEMA), sourceProfile)); Review Comment: Could you validate the source parallelism is changed after passing the source profile? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7718] Try to fetch the latestSourceProfile in HoodieIncrSource [hudi]
yihua commented on code in PR #11175: URL: https://github.com/apache/hudi/pull/11175#discussion_r1622904727 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java: ## @@ -231,7 +243,15 @@ public Pair>, String> fetchNextBatch(Option lastCkpt // Remove Hoodie meta columns except partition path from input source String[] colsToDrop = shouldDropMetaFields ? HoodieRecord.HOODIE_META_COLUMNS.stream().toArray(String[]::new) : HoodieRecord.HOODIE_META_COLUMNS.stream().filter(x -> !x.equals(HoodieRecord.PARTITION_PATH_METADATA_FIELD)).toArray(String[]::new); -final Dataset src = source.drop(colsToDrop); +Dataset src = source.drop(colsToDrop); +if (getLatestSourceProfile().isPresent()) { + src = coalesceOrRepartition(src, getLatestSourceProfile().get().getSourcePartitions()); +} Review Comment: Could `getLatestSourceProfile().map().orElse()` be used instead of reassigning the variable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7821] Handle case where older proto message is read with new schema [hudi]
hudi-bot commented on PR #11373: URL: https://github.com/apache/hudi/pull/11373#issuecomment-2142911283 ## CI report: * 32abc805a2e2d4764215bd7dea93ce72c0532bec UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2142900530 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * 475a1bc220eaee04fa78ba46a922b434b8306047 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24150) * d504e37ab6cee7d80e53e6daf2df1ef95eea01b7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24169) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7821) Handle schema evolution in proto to avro conversion
[ https://issues.apache.org/jira/browse/HUDI-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7821: - Labels: pull-request-available (was: ) > Handle schema evolution in proto to avro conversion > --- > > Key: HUDI-7821 > URL: https://issues.apache.org/jira/browse/HUDI-7821 > Project: Apache Hudi > Issue Type: Bug >Reporter: Timothy Brown >Priority: Major > Labels: pull-request-available > > Users can encounter errors when a batch of data was written with an older > schema and a new schema has fields that are not present in the old data -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7821] Handle case where older proto message is read with new schema [hudi]
the-other-tim-brown opened a new pull request, #11373: URL: https://github.com/apache/hudi/pull/11373 ### Change Logs - Adds support for handling proto messages that are missing fields, previously this would cause null pointer exceptions ### Impact Allows users consuming protos to evolve their schemas ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7669] Move config classes and utils to proper places [hudi]
yihua closed pull request #11095: [HUDI-7669] Move config classes and utils to proper places URL: https://github.com/apache/hudi/pull/11095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7821) Handle schema evolution in proto to avro conversion
Timothy Brown created HUDI-7821: --- Summary: Handle schema evolution in proto to avro conversion Key: HUDI-7821 URL: https://issues.apache.org/jira/browse/HUDI-7821 Project: Apache Hudi Issue Type: Bug Reporter: Timothy Brown Users can encounter errors when a batch of data was written with an older schema and a new schema has fields that are not present in the old data -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7669] Move config classes and utils to proper places [hudi]
yihua commented on PR #11095: URL: https://github.com/apache/hudi/pull/11095#issuecomment-2142879176 Closing this PR as it is no longer required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix operation total io should not exceed the target io limit [hudi]
yihua commented on code in PR #11174: URL: https://github.com/apache/hudi/pull/11174#discussion_r1622874529 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/strategy/BoundedIOCompactionStrategy.java: ## @@ -44,10 +44,10 @@ public List orderAndFilter(HoodieWriteConfig writeCon for (HoodieCompactionOperation op : operations) { long opIo = op.getMetrics().get(TOTAL_IO_MB).longValue(); targetIORemaining -= opIo; - finalOperations.add(op); - if (targetIORemaining <= 0) { + if (targetIORemaining < 0) { return finalOperations; } + finalOperations.add(op); Review Comment: This can lead to starvation if the target IO limit is always smaller than the `TOTAL_IO_MB` of the first compaction operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]
yihua commented on PR #11368: URL: https://github.com/apache/hudi/pull/11368#issuecomment-2142855292 Could you also raise a PR against https://github.com/apache/hudi/tree/branch-0.x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7748] Update ErrorTableAwareChainedTransformer.java [hudi]
yihua commented on PR #11197: URL: https://github.com/apache/hudi/pull/11197#issuecomment-2142852669 Are the changes in this PR still needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (9536d40f75d -> 7f8da18e550)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 9536d40f75d [MINOR] Avoid logging full commit metadata at info level (#11372) add 7f8da18e550 [HUDI-7766] Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile (#11234) No new revisions were added by this update. Summary of changes: scripts/release/deploy_staging_jars.sh | 9 + 1 file changed, 9 insertions(+)
Re: [PR] [HUDI-7766] Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile [hudi]
yihua merged PR #11234: URL: https://github.com/apache/hudi/pull/11234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7766] Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile [hudi]
yihua commented on PR #11234: URL: https://github.com/apache/hudi/pull/11234#issuecomment-2142851461 Skipping CI as only the release script is updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][ENM] Fix pending compaction check 3 [hudi]
yihua commented on PR #11217: URL: https://github.com/apache/hudi/pull/11217#issuecomment-2142849288 Closing this draft. Feel free to reopen when ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP][ENM] Fix pending compaction check 3 [hudi]
yihua closed pull request #11217: [WIP][ENM] Fix pending compaction check 3 URL: https://github.com/apache/hudi/pull/11217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2142847155 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * 475a1bc220eaee04fa78ba46a922b434b8306047 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24150) * d504e37ab6cee7d80e53e6daf2df1ef95eea01b7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7816]: Provide SourceProfileSupplier option into the SnapshotLoadQuerySplitter [hudi]
yihua commented on code in PR #11368: URL: https://github.com/apache/hudi/pull/11368#discussion_r1622849730 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java: ## @@ -61,20 +62,21 @@ public SnapshotLoadQuerySplitter(TypedProperties properties) { * * @param df The dataset to process. * @param beginCheckpointStr The starting checkpoint string. + * @param sourceProfileSupplier An Option of a SourceProfileSupplier to use in load splitting implementation * @return The next checkpoint as an Option. */ - public abstract Option getNextCheckpoint(Dataset df, String beginCheckpointStr); + public abstract Option getNextCheckpoint(Dataset df, String beginCheckpointStr, Option sourceProfileSupplier); /** - * Retrieves the next checkpoint based on query information. + * Retrieves the next checkpoint based on query information and a SourceProfileSupplier. * * @param df The dataset to process. * @param queryInfo The query information object. * @return Updated query information with the next checkpoint, in case of empty checkpoint, * returning endPoint same as queryInfo.getEndInstant(). */ - public QueryInfo getNextCheckpoint(Dataset df, QueryInfo queryInfo) { -return getNextCheckpoint(df, queryInfo.getStartInstant()) + public QueryInfo getNextCheckpoint(Dataset df, QueryInfo queryInfo, Option sourceProfileSupplier) { Review Comment: Add the new parameter `@param sourceProfileSupplier ` to the javadocs, same as above. ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java: ## @@ -61,20 +62,21 @@ public SnapshotLoadQuerySplitter(TypedProperties properties) { * * @param df The dataset to process. * @param beginCheckpointStr The starting checkpoint string. + * @param sourceProfileSupplier An Option of a SourceProfileSupplier to use in load splitting implementation Review Comment: Let's mark this class with `@PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING)` and the abstract methods with `@PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)`, given this class serves as an extendable API class for user to plug in custom implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2142823018 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 18fbd92eec10c49025db364be79cc9dbfccee362 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24162) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (130ea1a3142 -> 9536d40f75d)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 130ea1a3142 [HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5 (#11224) add 9536d40f75d [MINOR] Avoid logging full commit metadata at info level (#11372) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/client/BaseHoodieTableServiceClient.java | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-)
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
yihua merged PR #11372: URL: https://github.com/apache/hudi/pull/11372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
yihua commented on PR #11372: URL: https://github.com/apache/hudi/pull/11372#issuecomment-2142819366 Could you also raise a PR against https://github.com/apache/hudi/tree/branch-0.x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]
yihua commented on code in PR #10957: URL: https://github.com/apache/hudi/pull/10957#discussion_r1622828538 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -231,7 +231,13 @@ private static Option findNestedField(Schema schema, String[] fiel if (!nestedPart.isPresent()) { return Option.empty(); } -return nestedPart; +boolean isUnion = false; Review Comment: Could you write a unit test around the new logic (make the method accessible by the test class)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
hudi-bot commented on PR #11372: URL: https://github.com/apache/hudi/pull/11372#issuecomment-2142758270 ## CI report: * d9f2656aac6864e31474cc45506ceeefc8b8b36e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24166) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2142756169 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 2201cb0dea3acbe7597b319be7f14ce7a2a8543f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24165) * 18fbd92eec10c49025db364be79cc9dbfccee362 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
hudi-bot commented on PR #11372: URL: https://github.com/apache/hudi/pull/11372#issuecomment-2142672637 ## CI report: * d9f2656aac6864e31474cc45506ceeefc8b8b36e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24166) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2142670862 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 2201cb0dea3acbe7597b319be7f14ce7a2a8543f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24165) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
hudi-bot commented on PR #11372: URL: https://github.com/apache/hudi/pull/11372#issuecomment-2142662394 ## CI report: * d9f2656aac6864e31474cc45506ceeefc8b8b36e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2142660333 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 15ed1ad17c8b99804d6e404342a11fab6e212935 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22078) * 2201cb0dea3acbe7597b319be7f14ce7a2a8543f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]
the-other-tim-brown commented on code in PR #11154: URL: https://github.com/apache/hudi/pull/11154#discussion_r1622695605 ## hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java: ## @@ -151,11 +163,11 @@ public static Schema nullableSchema(Schema schema) { * @param schema a avro schema. * @return a hudi type. */ - public static Type buildTypeFromAvroSchema(Schema schema) { + public static Type buildTypeFromAvroSchema(Schema schema, Map existingNameToPositions) { // set flag to check this has not been visited. -Deque visited = new LinkedList(); -AtomicInteger nextId = new AtomicInteger(1); -return visitAvroSchemaToBuildType(schema, visited, true, nextId); +Deque visited = new LinkedList<>(); +AtomicInteger nextId = new AtomicInteger(0); Review Comment: I thought this was a bug since you typically start with 0 when coding -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7713] Enforce ordering of fields during schema reconciliation [hudi]
jonvex commented on code in PR #11154: URL: https://github.com/apache/hudi/pull/11154#discussion_r1622678545 ## hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java: ## @@ -117,10 +120,19 @@ public static Schema convert(Type type, String name) { /** Convert an avro schema into internal type. */ public static Type convertToField(Schema schema) { -return buildTypeFromAvroSchema(schema); +return buildTypeFromAvroSchema(schema, Collections.emptyMap()); } + private static Type convertToField(Schema schema, Map existingFieldNameToPositionMapping) { +return buildTypeFromAvroSchema(schema, existingFieldNameToPositionMapping); + } + + Review Comment: remove empty line ## hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java: ## @@ -151,11 +163,11 @@ public static Schema nullableSchema(Schema schema) { * @param schema a avro schema. * @return a hudi type. */ - public static Type buildTypeFromAvroSchema(Schema schema) { + public static Type buildTypeFromAvroSchema(Schema schema, Map existingNameToPositions) { // set flag to check this has not been visited. -Deque visited = new LinkedList(); -AtomicInteger nextId = new AtomicInteger(1); -return visitAvroSchemaToBuildType(schema, visited, true, nextId); +Deque visited = new LinkedList<>(); +AtomicInteger nextId = new AtomicInteger(0); Review Comment: why do we go from 1->0? Is this because we remove ``` if (firstVisitRoot) { nextAssignId = 0; } ``` ## hudi-spark-datasource/hudi-spark-common/src/test/java/org/apache/hudi/TestHoodieSchemaUtils.java: ## @@ -239,6 +240,51 @@ void testMissingColumn(boolean allowDroppedColumns) { } } + @Test + void testFieldReordering() { +// field order changes and incoming schema is missing an existing field +Schema start = createRecord("reorderFields", +createPrimitiveField("field1", Schema.Type.INT), +createPrimitiveField("field2", Schema.Type.INT), +createPrimitiveField("field3", Schema.Type.INT)); +Schema end = createRecord("reorderFields", +createPrimitiveField("field3", Schema.Type.INT), +createPrimitiveField("field1", Schema.Type.INT)); +assertEquals(start, deduceWriterSchema(end, start, true)); + +// nested field ordering changes and new field is added +start = createRecord("reorderNestedFields", +createPrimitiveField("field1", Schema.Type.INT), +createPrimitiveField("field2", Schema.Type.INT), +createArrayField("field3", createRecord("nestedRecord", +createPrimitiveField("nestedField1", Schema.Type.INT), +createPrimitiveField("nestedField2", Schema.Type.INT), +createPrimitiveField("nestedField3", Schema.Type.INT))), +createPrimitiveField("field4", Schema.Type.INT)); +end = createRecord("reorderNestedFields", +createPrimitiveField("field1", Schema.Type.INT), +createPrimitiveField("field2", Schema.Type.INT), +createPrimitiveField("field5", Schema.Type.INT), +createArrayField("field3", createRecord("nestedRecord", +createPrimitiveField("nestedField2", Schema.Type.INT), +createPrimitiveField("nestedField1", Schema.Type.INT), +createPrimitiveField("nestedField3", Schema.Type.INT), +createPrimitiveField("nestedField4", Schema.Type.INT))), +createPrimitiveField("field4", Schema.Type.INT)); + +Schema expected = createRecord("reorderNestedFields", +createPrimitiveField("field1", Schema.Type.INT), +createPrimitiveField("field2", Schema.Type.INT), +createArrayField("field3", createRecord("reorderNestedFields.field3", Review Comment: ok, can you please change the nested record name to `reorderNestedFields.field3` in start and end? That way we isolate what we are testing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Avoid logging full commit metadata at info level [hudi]
the-other-tim-brown opened a new pull request, #11372: URL: https://github.com/apache/hudi/pull/11372 ### Change Logs - Updates log messages to avoid logging full commit metadata after each table service to reduce volume of logs when working with large tables ### Impact - Reduce log volume during normal operation ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Hudi Deltastreamer compaction is taking longer duration [hudi]
ad1happy2go commented on issue #11273: URL: https://github.com/apache/hudi/issues/11273#issuecomment-2142552863 @SuneethaYamani Metadata table helps you to reduce file listing api calls. You can disable in case this is only becoming the bottleneck. Although we want to understand why it's taking so long. Can you share writer configs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2142473345 @njalan Also as I understood, data what you are writing is output of 10 tables. SO when you are doing insert_overwrite, Does that source data frame contains dups? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
codope commented on code in PR #10422: URL: https://github.com/apache/hudi/pull/10422#discussion_r1622387730 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieFileGroupReaderRecordReader.java: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.hadoop; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.config.HoodieCommonConfig; +import org.apache.hudi.common.config.HoodieReaderConfig; +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.model.BaseFile; +import org.apache.hudi.common.model.FileSlice; +import org.apache.hudi.common.model.HoodieBaseFile; +import org.apache.hudi.common.model.HoodieFileGroupId; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; +import org.apache.hudi.common.table.read.HoodieFileGroupReader; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.FileIOUtils; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.TablePathUtils; +import org.apache.hudi.common.util.collection.ExternalSpillableMap; +import org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader; +import org.apache.hudi.hadoop.realtime.RealtimeSplit; +import org.apache.hudi.hadoop.utils.HoodieRealtimeInputFormatUtils; +import org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils; + +import org.apache.avro.Schema; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.metastore.api.hive_metastoreConstants; +import org.apache.hadoop.hive.serde2.ColumnProjectionUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; +import java.util.function.UnaryOperator; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +import static org.apache.hudi.common.config.HoodieCommonConfig.DISK_MAP_BITCASK_COMPRESSION_ENABLED; +import static org.apache.hudi.common.config.HoodieCommonConfig.SPILLABLE_DISK_MAP_TYPE; +import static org.apache.hudi.common.config.HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE; +import static org.apache.hudi.common.config.HoodieMemoryConfig.SPILLABLE_MAP_BASE_PATH; + +public class HoodieFileGroupReaderRecordReader implements RecordReader { + + public interface HiveReaderCreator { +org.apache.hadoop.mapred.RecordReader getRecordReader( +final org.apache.hadoop.mapred.InputSplit split, +final org.apache.hadoop.mapred.JobConf job, +final org.apache.hadoop.mapred.Reporter reporter +) throws IOException; + } + + private final HiveHoodieReaderContext readerContext; + private final HoodieFileGroupReader fileGroupReader; + private final ArrayWritable arrayWritable; + private final NullWritable nullWritable = NullWritable.get(); + private final InputSplit inputSplit; + private final JobConf jobConfCopy; + private final UnaryOperator reverseProjection; + + public HoodieFileGroupReaderRecordReader(HiveReaderCreator readerCreator, + final InputSplit split, + final JobConf jobConf, + final Reporter reporter) throws IOException { +this.jobConfCopy = new JobConf(jobConf); +HoodieRealtimeInputFormatUtils.cleanProjectionColumnIds(jobConfCopy); +Set partitionColumns = new HashSet<>(getPartitionFieldNames(jobConfCopy)); +this.inputSplit = split; + +FileSplit fileSplit = (FileSplit) split; +String tableBasePath = getTableBasePath(split, jo
(hudi) branch master updated (0e55f0900d8 -> 130ea1a3142)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 0e55f0900d8 [HUDI-7817] Use Jackson Core instead of org.codehaus.jackson for JSON encoding (#11369) add 130ea1a3142 [HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5 (#11224) No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scala | 1 - .../main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala | 6 +- 2 files changed, 5 insertions(+), 2 deletions(-)
Re: [PR] [HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5 [hudi]
leesf merged PR #11224: URL: https://github.com/apache/hudi/pull/11224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] issue with reading the data using hudi streamer [hudi]
ad1happy2go commented on issue #11263: URL: https://github.com/apache/hudi/issues/11263#issuecomment-2141946954 Using schema registry fixed this issue. Discussed in this thread - https://apache-hudi.slack.com/archives/C4D716NPQ/p1716384858692059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] issue with reading the data using hudi streamer [hudi]
codope closed issue #11263: issue with reading the data using hudi streamer URL: https://github.com/apache/hudi/issues/11263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Partial update of the MOR table after compaction with Hudi Streamer [hudi]
ad1happy2go commented on issue #11348: URL: https://github.com/apache/hudi/issues/11348#issuecomment-2141809072 @kirillklimenko I tried to mimic similar scenario but it is avoiding columns with null values. Can you come up with reproducible script. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2141806002 @njalan Are you using multi writers? Can you come up with a reproducible script. You are using very old Hudi version though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Reliable ingestion from AWS S3 using Hudi is failing with software.amazon.awssdk.services.sqs.model.EmptyBatchRequestException [hudi]
ad1happy2go commented on issue #11168: URL: https://github.com/apache/hudi/issues/11168#issuecomment-2141790434 @SuneethaYamani Yeah this was was there in 0.14.1 only. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue [hudi]
codope closed issue #11349: [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue URL: https://github.com/apache/hudi/issues/11349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2141435482 ## CI report: * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN * b4a5700f408e7ef6639eb05528a029d7de45e99f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24161) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] StreamWriteFunction support Exectly-Once in Flink ? [hudi]
seekforshell closed issue #11004: [SUPPORT] StreamWriteFunction support Exectly-Once in Flink ? URL: https://github.com/apache/hudi/issues/11004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2141364905 ## CI report: * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN * 6ece7645a69b367901c71ab78dea15f39d69fca5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24140) * b4a5700f408e7ef6639eb05528a029d7de45e99f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24161) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7007] Add bloom_filters index support on read side [hudi]
hudi-bot commented on PR #11043: URL: https://github.com/apache/hudi/pull/11043#issuecomment-2141355244 ## CI report: * 541b544049e68b3d22cdf0f5159fbd9b0005d345 UNKNOWN * 6ece7645a69b367901c71ab78dea15f39d69fca5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24140) * b4a5700f408e7ef6639eb05528a029d7de45e99f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org