[GitHub] [hudi] codope commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
codope commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1483741022 Yeah the build succeeded locally for me for Spark 3.3. I just pushed a commit after rebasing with master. Let's see. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
hudi-bot commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1483740825 ## CI report: * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897) * 5b31410a0bb28ebb16d0af88f8c45662b6b8fd92 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483737519 ## CI report: * 90db3447a020728c0fc12b3714cd018482b89d1d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15912) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7847: [HUDI-5697] Revisiting refreshing of Hudi relations after write operations on the tables
codope commented on code in PR #7847: URL: https://github.com/apache/hudi/pull/7847#discussion_r1148306772 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalogUtils.scala: ## @@ -17,8 +17,76 @@ package org.apache.spark.sql +import org.apache.spark.sql.catalyst.catalog.CatalogTableType +import org.apache.spark.sql.catalyst.{QualifiedTableName, TableIdentifier} + /** * NOTE: Since support for [[TableCatalog]] was only added in Spark 3, this trait * is going to be an empty one simply serving as a placeholder (for compatibility w/ Spark 2) */ trait HoodieCatalogUtils {} + +object HoodieCatalogUtils { + + /** + * Please check scala-doc for other overloaded [[refreshTable()]] operation + */ + def refreshTable(spark: SparkSession, qualifiedTableName: String): Unit = { +val tableId = spark.sessionState.sqlParser.parseTableIdentifier(qualifiedTableName) +refreshTable(spark, tableId) + } + + /** + * Refreshes metadata and flushes cached data (resolved [[LogicalPlan]] representation, + * already loaded [[InMemoryRelation]]) for the table identified by [[tableId]]. + * + * This method is usually invoked at the end of the write operation to make sure cached + * data/metadata are synchronized with the state on storage. + * + * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING + * This is borrowed from Spark 3.1.3 and modified to satisfy Hudi needs: + * - Unlike Spark canonical implementation, in case of Hudi this method is invoked + *after writes carried out via Spark DataSource integration as well and as such + *in these cases data might actually be missing from the caches, therefore + *actually re-triggering resolution phase (involving file-listing, etc) for the + *first time + * - Additionally, this method is modified to avoid refreshing [[LogicalRelation]] + *completely to make sure that we're not re-triggering the file-listing of the + *table, immediately after it's been written, instead deferring it to subsequent + *read operation + */ + def refreshTable(spark: SparkSession, tableId: TableIdentifier): Unit = { +val sessionCatalog = spark.sessionState.catalog +val tableMetadata = sessionCatalog.getTempViewOrPermanentTableMetadata(tableId) + +// Before proceeding we validate whether this table is actually cached w/in [[SessionCatalog]], +// since, for ex, in case of writing via Spark DataSource (V1) API, Spark wouldn't actually +// resort to caching the data +val cachedPlan = sessionCatalog.getCachedTable( + QualifiedTableName(tableId.database.getOrElse(tableMetadata.database), tableId.identifier)) + +if (cachedPlan != null) { + // NOTE: Provided that this table is still cached, following operation would not be + // triggering subsequent resolution and listing of the table + val table = spark.table(tableId) + + if (tableMetadata.tableType == CatalogTableType.VIEW) { Review Comment: I concur with @YannByron comment. @alexeykudinkin What do you think? IMO, not invalidating the relation cache could actually help performance slightly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] duc-dn commented on issue #8273: [SUPPORT] How to connect Hudi cli to MinIO
duc-dn commented on issue #8273: URL: https://github.com/apache/hudi/issues/8273#issuecomment-1483723081 @umehrot2 can you help me, please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting
wecharyu commented on code in PR #8219: URL: https://github.com/apache/hudi/pull/8219#discussion_r1148304318 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, SessionCatalog} +import org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig + +import scala.reflect.ClassTag + +abstract class TestOperationConfig extends HoodieSparkSqlTestBase { + val catalog: SessionCatalog = spark.sessionState.catalog + Review Comment: This abstract class aims to be inherited to test other operation like `buildHoodieDeleteTableConfig`, `buildHoodieDropPartitionsConfig` etc, a companion object is more reasonable if we do not need test other operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting
danny0405 commented on code in PR #8219: URL: https://github.com/apache/hudi/pull/8219#discussion_r1148280602 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala: ## @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, SessionCatalog} +import org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig + +import scala.reflect.ClassTag + +abstract class TestOperationConfig extends HoodieSparkSqlTestBase { + val catalog: SessionCatalog = spark.sessionState.catalog + Review Comment: No need to make the clazz abstract, if you wanna to define some common utilities methods, just declare a companion object class with the same name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KnightChess commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag
KnightChess commented on PR #8275: URL: https://github.com/apache/hudi/pull/8275#issuecomment-1483714107 @nsivabalan use `writeStats` will not trigger clustering dag too, I think it has no result gap if use it https://user-images.githubusercontent.com/20125927/227693933-22fa67f4-1d34-4b25-bf0b-83bfac2b21f5.png";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?
danny0405 commented on issue #8267: URL: https://github.com/apache/hudi/issues/8267#issuecomment-1483713136 1. the `--service` param has no value, it is a non-valued param 2. the path is null: I guess the path means base files to compact, in your use case, there are no parquets but all logs to compact 3. the `.inflight` file is just a marker file to indicate that the transaction of current instant is on-going. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #8274: [SUPPORT] Append Mode should support close the bloom filter option
danny0405 commented on issue #8274: URL: https://github.com/apache/hudi/issues/8274#issuecomment-1483712177 > a certain impact on write throughput I'm confused why turning off the BF increased the write throughput. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] 15663671003 commented on issue #8287: [SUPPORT]
15663671003 commented on issue #8287: URL: https://github.com/apache/hudi/issues/8287#issuecomment-1483709822 I want to maintain an incremental read program when valud=true or update_time != record_time, the record is taken out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483702560 ## CI report: * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910) * 90db3447a020728c0fc12b3714cd018482b89d1d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15912) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483700457 ## CI report: * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901) * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910) * 90db3447a020728c0fc12b3714cd018482b89d1d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483698416 ## CI report: * a5482d72c24d9c8b5b4c30609e2efda16737994a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15911) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] voonhous commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
voonhous commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483686787 Not sure why the CI is failing ``` TestSpark3DDL: - Test multi change data type - Test multi change data type2 - Test Enable and Disable Schema on read - Test Partition Table alter - Test Chinese table - Test Alter Table - Test Alter Table multiple times - Test Alter Table complex - Test schema auto evolution complex - Test schema auto evolution 3157022 [ScalaTest-main-running-TestSpark3DDL] WARN org.apache.hudi.metadata.HoodieBackedTableMetadata [] - Metadata table was not found at path file:/tmp/spark-f80c21e6-ac26-4762-8e46-24b7b8984f60/h31/.hoodie/metadata - Test DATE to STRING conversions when vectorized reading is not enabled *** FAILED *** org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'alter table h31 alter'(line 1, pos 16) == SQL == alter table h31 alter column `date_to_string_col` type string ^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:45) at org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:42) at org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parse(HoodieSpark2ExtendedSqlParser.scala:80) at org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parsePlan(HoodieSpark2ExtendedSqlParser.scala:42) at org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:43) at org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:40) ... [INFO] hudi-spark-datasource .. SUCCESS [ 3.232 s] [INFO] hudi-spark-common_2.11 . SUCCESS [ 21.403 s] [INFO] hudi-spark2_2.11 ... SUCCESS [01:00 min] [INFO] hudi-spark_2.11 FAILURE [ 02:03 h] [INFO] hudi-spark2-common . SUCCESS [ 0.028 s] ``` Just curious why `TestSpark3DDL` is being executed on a target that is running Spark2.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483685344 ## CI report: * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908) * a5482d72c24d9c8b5b4c30609e2efda16737994a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15911) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483685309 ## CI report: * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901) * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
xiarixiaoyao commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483683848 @voonhous Thank you for your contribution, LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483682538 ## CI report: * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908) * a5482d72c24d9c8b5b4c30609e2efda16737994a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483682518 ## CI report: * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901) * 7dfa9a09b36fe2f9af728365843cd89877994cb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…
hudi-bot commented on PR #8289: URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483680644 ## CI report: * bedea321750519d71acc0850e2c6b9d1c37132e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15909) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…
hudi-bot commented on PR #8289: URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483625455 ## CI report: * bedea321750519d71acc0850e2c6b9d1c37132e9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15909) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…
hudi-bot commented on PR #8289: URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483618869 ## CI report: * bedea321750519d71acc0850e2c6b9d1c37132e9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex opened a new pull request, #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…
jonvex opened a new pull request, #8289: URL: https://github.com/apache/hudi/pull/8289 … for spark2.4 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483612636 ## CI report: * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483571586 ## CI report: * 810154088a84751c752caef67a0a2759628c5209 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15907) * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483565242 ## CI report: * 810154088a84751c752caef67a0a2759628c5209 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15907) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
nsivabalan commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483564763 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483486772 ## CI report: * 810154088a84751c752caef67a0a2759628c5209 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2
hudi-bot commented on PR #8187: URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483486432 ## CI report: * 052478e775ef24c47336971ce58392fe29e7ac45 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15902) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483424319 ## CI report: * f1187da731158ce7463332240971771c54fd9f41 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905) * 810154088a84751c752caef67a0a2759628c5209 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483404763 ## CI report: * f1187da731158ce7463332240971771c54fd9f41 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483365421 ## CI report: * f1187da731158ce7463332240971771c54fd9f41 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
hudi-bot commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483356312 ## CI report: * f1187da731158ce7463332240971771c54fd9f41 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483348699 ## CI report: * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483348605 ## CI report: * 076980c169ad316345ce81e4c96d3eb4387a51d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15903) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
nsivabalan commented on PR #8288: URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483331927 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test
nsivabalan opened a new pull request, #8288: URL: https://github.com/apache/hudi/pull/8288 ### Change Logs Release 0.12.3 prep triage flaky test ### Impact Release 0.12.3 prep triage flaky test ### Risk level (write none, low medium or high below) low. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483303489 ## CI report: * 757ff2448d8e19b19be316803fd29a9c89a747bb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15881) * 076980c169ad316345ce81e4c96d3eb4387a51d1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15903) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2
hudi-bot commented on PR #8187: URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483303167 ## CI report: * 39a14a897c8574f1760d538056b6985344d6eb9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15721) * 052478e775ef24c47336971ce58392fe29e7ac45 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15902) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2
hudi-bot commented on PR #8187: URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483294861 ## CI report: * 39a14a897c8574f1760d538056b6985344d6eb9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15721) * 052478e775ef24c47336971ce58392fe29e7ac45 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483295204 ## CI report: * 757ff2448d8e19b19be316803fd29a9c89a747bb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15881) * 076980c169ad316345ce81e4c96d3eb4387a51d1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5981) Upgrade aws java sdk v2
Rahil Chertara created HUDI-5981: Summary: Upgrade aws java sdk v2 Key: HUDI-5981 URL: https://issues.apache.org/jira/browse/HUDI-5981 Project: Apache Hudi Issue Type: Task Reporter: Rahil Chertara -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1483241834 ## CI report: * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15900) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector
hudi-bot commented on PR #8285: URL: https://github.com/apache/hudi/pull/8285#issuecomment-1483211187 ## CI report: * a459c3d46e7357e0d921e562a2fc79c98b69e7dc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15899) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (6916803f7a4 -> 41026ef1fea)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 6916803f7a4 [HUDI-5941] Support savepoint call procedure with base path in Spark SQL (#8271) add 41026ef1fea [HUDI-5289] Avoiding repeated trigger of clustering dag (#8275) No new revisions were added by this update. Summary of changes: .../action/commit/BaseCommitActionExecutor.java| 4 ++ .../apache/hudi/functional/TestCOWDataSource.scala | 53 ++ 2 files changed, 57 insertions(+)
[GitHub] [hudi] nsivabalan merged pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag
nsivabalan merged PR #8275: URL: https://github.com/apache/hudi/pull/8275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag
nsivabalan commented on PR #8275: URL: https://github.com/apache/hudi/pull/8275#issuecomment-1483190292 hey @KnightChess : not sure whats your suggestion. We do already check isEmpty in SparkRddWriteClient. ``` private void validateClusteringCommit(HoodieWriteMetadata> clusteringMetadata, String clusteringCommitTime, HoodieTable table) { if (clusteringMetadata.getWriteStatuses().isEmpty()) { HoodieClusteringPlan clusteringPlan = ClusteringUtils.getClusteringPlan( table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(clusteringCommitTime)) .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException( "Unable to read clustering plan for instant: " + clusteringCommitTime)); throw new HoodieClusteringException("Clustering plan produced 0 WriteStatus for " + clusteringCommitTime + " #groups: " + clusteringPlan.getInputGroups().size() + " expected at least " + clusteringPlan.getInputGroups().stream().mapToInt(HoodieClusteringGroup::getNumOutputFileGroups).sum() + " write statuses"); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483159402 ## CI report: * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898) * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483149756 ## CI report: * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898) * 2b273f906891d2e4e9fea23c148eb524ae1c667e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5980) Add tests to guard against repeated dag trigger using spark event listeners
sivabalan narayanan created HUDI-5980: - Summary: Add tests to guard against repeated dag trigger using spark event listeners Key: HUDI-5980 URL: https://issues.apache.org/jira/browse/HUDI-5980 Project: Apache Hudi Issue Type: Improvement Components: tests-ci Reporter: sivabalan narayanan as of now, we don't have a good way to guard repeated dag trigger. all of our existing tests only checks for data. but w/ reconcile strategy, the extra files will be removed if some dag was repeated. So, we might need to add more tests to catch them if incase someone changes the dag in future. Eg test: [https://github.com/apache/hudi/pull/8275/files] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nsivabalan commented on a diff in pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag
nsivabalan commented on code in PR #8275: URL: https://github.com/apache/hudi/pull/8275#discussion_r1147858885 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -255,6 +257,8 @@ protected HoodieWriteMetadata> executeClustering(HoodieC .performClustering(clusteringPlan, schema, instantTime); HoodieData writeStatusList = writeMetadata.getWriteStatuses(); HoodieData statuses = updateIndex(writeStatusList, writeMetadata); +statuses.persist(config.getString(WRITE_STATUS_STORAGE_LEVEL_VALUE), context, HoodieData.HoodieDataCacheKey.of(config.getBasePath(), instantTime)); Review Comment: https://issues.apache.org/jira/browse/HUDI-5980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483136033 ## CI report: * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
deepikaeswar95 commented on issue #8286: URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483067623 Common config file is shared via slack hudi community @soumilshah1995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
deepikaeswar95 commented on issue #8286: URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483065843 @soumilshah1995 attaching the latest hudi specific logs enabled and run in bulk insert mode [Hudi logs -bulk insert.txt](https://github.com/apache/hudi/files/11064258/Hudi.logs.-bulk.insert.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
deepikaeswar95 commented on issue #8286: URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483064629 @soumilshah1995 , I have tried partition by partition. We have data from 21 st feb to till date . When we load partition by partition using upsert, the data is loaded perfectly, but when the delta streamer is run in continuous mode (upsert) it fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] 15663671003 opened a new issue, #8287: [SUPPORT]
15663671003 opened a new issue, #8287: URL: https://github.com/apache/hudi/issues/8287 I implemented a custom payload based on HoudieRecordPayload, but there were problems. When I use incremental queries, record_ "Time is the value of the incremental payload (incorrect). When running a snapshot query, record"_ Time is an old value (correct), which does not meet my expectations. Does the payload obtained by incremental queries differ from the snapshot query results? Please help me ```java /* Omitted content */ public class CustomPayload extends OverwriteWithLatestAvroPayload { /* Omitted content */ @Override public Option combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema, Properties properties) throws IOException { if (recordBytes.length == 0) { return Option.empty(); } GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema); if (!needUpdatingPersistedRecord(currentValue, incomingRecord, properties)) { return Option.of(currentValue); } /*custom code*/ if (((GenericRecord) currentValue).get("record_time") != null) { incomingRecord.put("record_time", ((GenericRecord) currentValue).get("record_time")); } eventTime = updateEventTime(incomingRecord, properties); return isDeleteRecord(incomingRecord) ? Option.empty() : Option.of(incomingRecord); } /* Omitted content */ protected boolean needUpdatingPersistedRecord(IndexedRecord currentValue, IndexedRecord incomingRecord, Properties properties) { /* Omitted content */ return (((Comparable) persistedOrderingVal).compareTo(incomingOrderingVal) < 0) && ( ((GenericRecord) currentValue).get("valid").equals(true) || ((GenericRecord) incomingRecord).get("valid").equals(true)) && ( ((GenericRecord) currentValue).get("content_md5") == null || !((GenericRecord) currentValue).get("content_md5").equals(((GenericRecord) incomingRecord).get("content_md5")) ); } } ``` **Expected behavior** ```shell >>> spark.read.format("hudi").load("*").filter("*").show(truncate=False) +---++ +--++ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key |_hoodie_partition_path|_hoodie_file_name |valid|update_time|record_time|content_md5 |id | +---++ +--++ |20230324152704225 |20230324152704225_13_2563860|7306da3dd0c41ff504447981c4e850949db69524154c0c5bf85e62758babf3cc | |0013-ea30-4f9d-9704-e4f82fceb940-0 | false|2023-03-24 14:48:09|2023-03-01 21:40:42|df3c2a9f8eaf5b8eec26b363cc67003f |7306da3dd0c41ff504447981c4e850949db69524154c0c5bf85e62758babf3cc| >>> df = spark.read.format("hudi").options(**{'hoodie.datasource.query.type': "incremental", "hoodie.datasource.read.begin.instanttime": '20230324151944584'}).load("**") >>> spark.read.format("hudi").load("*").filter("*").show(truncate=False) +---++ +--++ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key |_hoodie_partition_path|_hoodie_file_name |valid|update_time|record_time|content_md5 |id | +---++ +--++ |20230324152704225 |20230324152704225_13_2563860|7306da3dd0c41ff50444
[GitHub] [hudi] soumilshah1995 commented on issue #8260: [SUPPORT] How to implement incremental join
soumilshah1995 commented on issue #8260: URL: https://github.com/apache/hudi/issues/8260#issuecomment-1483022972 what do you think does this solve your issue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] soumilshah1995 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
soumilshah1995 commented on issue #8286: URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483021653 i think its making to many calls to S3 can yoiu try going or reading data partition by partition ? can you share configs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
hudi-bot commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482984410 ## CI report: * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting
hudi-bot commented on PR #8219: URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482952270 ## CI report: * 3fed80ecdcfabf29904025fa84e2d08505351189 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stathismar commented on issue #8278: [SUPPORT] Deltastreamer Fails with AWSDmsAvroPayload
stathismar commented on issue #8278: URL: https://github.com/apache/hudi/issues/8278#issuecomment-1482931370 **Update:** I tried to submit the DeltaStreamer Spark Job to a local Minikube cluster and the issue gone away. I'm not sure what the problem was exactly. Most probably it has to do with incompatibility between my system's Java version and Spark 3.3.1/Hudi. In order to run Spark with Hudi on K8s, I used the official Spark image and everything worked as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nicholas-fwang commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
nicholas-fwang commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482900107 @codope thanks for review. I'm trying to find a violation for failure of https://github.com/apache/hudi/actions/runs/4510471335/jobs/7943933966?pr=8284 but I couldn't find them in my build, just success. could you know what's checkstyle violation in this PR? thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482889660 ## CI report: * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894) * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15900) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482875309 ## CI report: * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894) * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector
hudi-bot commented on PR #8285: URL: https://github.com/apache/hudi/pull/8285#issuecomment-1482860942 ## CI report: * a459c3d46e7357e0d921e562a2fc79c98b69e7dc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15899) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8163: [HUDI-5921] Partition path should be considered in BucketIndexConcurrentFileWritesConflictResolutionStrategy
hudi-bot commented on PR #8163: URL: https://github.com/apache/hudi/pull/8163#issuecomment-1482859722 ## CI report: * fa6a26972e75f23b195c24cd51619f6409b42c95 UNKNOWN * 7bb2b915a31acb05af30d6672fa755a4a1bc59ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15895) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope closed issue #7589: [Support] Keep only clustered file(all) after cleaning
codope closed issue #7589: [Support] Keep only clustered file(all) after cleaning URL: https://github.com/apache/hudi/issues/7589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope merged pull request #8271: [HUDI-5941] Support savepoint call procedure with base path in Spark SQL
codope merged PR #8271: URL: https://github.com/apache/hudi/pull/8271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-5941] Support savepoint call procedure with base path in Spark SQL (#8271)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6916803f7a4 [HUDI-5941] Support savepoint call procedure with base path in Spark SQL (#8271) 6916803f7a4 is described below commit 6916803f7a40a4af57e7de1f927a4d5aa7025e32 Author: Y Ethan Guo AuthorDate: Fri Mar 24 06:45:58 2023 -0700 [HUDI-5941] Support savepoint call procedure with base path in Spark SQL (#8271) --- .../procedures/CreateSavepointProcedure.scala | 8 ++- .../procedures/DeleteSavepointProcedure.scala | 8 ++- .../procedures/RollbackToSavepointProcedure.scala | 8 ++- .../procedures/ShowSavepointsProcedure.scala | 6 +- .../hudi/procedure/TestSavepointsProcedure.scala | 70 +++--- 5 files changed, 69 insertions(+), 31 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala index e81b6f086a2..8a40cfb502d 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala @@ -28,10 +28,11 @@ import java.util.function.Supplier class CreateSavepointProcedure extends BaseProcedure with ProcedureBuilder with Logging { private val PARAMETERS = Array[ProcedureParameter]( -ProcedureParameter.required(0, "table", DataTypes.StringType, None), +ProcedureParameter.optional(0, "table", DataTypes.StringType, None), ProcedureParameter.required(1, "commit_time", DataTypes.StringType, None), ProcedureParameter.optional(2, "user", DataTypes.StringType, ""), -ProcedureParameter.optional(3, "comments", DataTypes.StringType, "") +ProcedureParameter.optional(3, "comments", DataTypes.StringType, ""), +ProcedureParameter.optional(4, "path", DataTypes.StringType, None) ) private val OUTPUT_TYPE = new StructType(Array[StructField]( @@ -46,11 +47,12 @@ class CreateSavepointProcedure extends BaseProcedure with ProcedureBuilder with super.checkArgs(PARAMETERS, args) val tableName = getArgValueOrDefault(args, PARAMETERS(0)) +val tablePath = getArgValueOrDefault(args, PARAMETERS(4)) val commitTime = getArgValueOrDefault(args, PARAMETERS(1)).get.asInstanceOf[String] val user = getArgValueOrDefault(args, PARAMETERS(2)).get.asInstanceOf[String] val comments = getArgValueOrDefault(args, PARAMETERS(3)).get.asInstanceOf[String] -val basePath: String = getBasePath(tableName) +val basePath: String = getBasePath(tableName, tablePath) val metaClient = HoodieTableMetaClient.builder.setConf(jsc.hadoopConfiguration()).setBasePath(basePath).build val activeTimeline: HoodieActiveTimeline = metaClient.getActiveTimeline diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala index 1cdd0638f1a..5d3b9b22285 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala @@ -28,8 +28,9 @@ import java.util.function.Supplier class DeleteSavepointProcedure extends BaseProcedure with ProcedureBuilder with Logging { private val PARAMETERS = Array[ProcedureParameter]( -ProcedureParameter.required(0, "table", DataTypes.StringType, None), -ProcedureParameter.required(1, "instant_time", DataTypes.StringType, None) +ProcedureParameter.optional(0, "table", DataTypes.StringType, None), +ProcedureParameter.required(1, "instant_time", DataTypes.StringType, None), +ProcedureParameter.optional(2, "path", DataTypes.StringType, None) ) private val OUTPUT_TYPE = new StructType(Array[StructField]( @@ -44,9 +45,10 @@ class DeleteSavepointProcedure extends BaseProcedure with ProcedureBuilder with super.checkArgs(PARAMETERS, args) val tableName = getArgValueOrDefault(args, PARAMETERS(0)) +val tablePath = getArgValueOrDefault(args, PARAMETERS(2)) val instantTime = getArgValueOrDefault(args, PARAMETERS(1)).get.asInstanceOf[String] -val basePath: String = getBasePath(tableName) +val basePath: String = getBasePath(tableName, tablePath) val metaClient = HoodieTableMetaClient.builder.setConf(jsc.hadoopConfiguration()).setBasePath(basePath).build val completedInstants = met
[GitHub] [hudi] codope commented on a diff in pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag
codope commented on code in PR #8275: URL: https://github.com/apache/hudi/pull/8275#discussion_r1147590832 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -255,6 +257,8 @@ protected HoodieWriteMetadata> executeClustering(HoodieC .performClustering(clusteringPlan, schema, instantTime); HoodieData writeStatusList = writeMetadata.getWriteStatuses(); HoodieData statuses = updateIndex(writeStatusList, writeMetadata); +statuses.persist(config.getString(WRITE_STATUS_STORAGE_LEVEL_VALUE), context, HoodieData.HoodieDataCacheKey.of(config.getBasePath(), instantTime)); Review Comment: Good call and thanks for adding a test using `StageEventManager`. Could you create a JIRA to add more such tests. We need DAG tests to guard changes in DAG. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
deepikaeswar95 commented on issue #8286: URL: https://github.com/apache/hudi/issues/8286#issuecomment-1482804011 The spark job fails when delta streamer is run in upsert continuous mode or bulk insert . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] deepikaeswar95 opened a new issue, #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode
deepikaeswar95 opened a new issue, #8286: URL: https://github.com/apache/hudi/issues/8286 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope closed issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException
codope closed issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException URL: https://github.com/apache/hudi/issues/8257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException
codope commented on issue #8257: URL: https://github.com/apache/hudi/issues/8257#issuecomment-1482802157 Closing the issue as we have a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
Zouxxyy commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482799152 > @Zouxxyy Looks like there is a checkstyle violation. Can you please correct? https://github.com/apache/hudi/actions/runs/4509620758/jobs/7944047354?pr=8277 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector
hudi-bot commented on PR #8285: URL: https://github.com/apache/hudi/pull/8285#issuecomment-1482793073 ## CI report: * a459c3d46e7357e0d921e562a2fc79c98b69e7dc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
codope commented on code in PR #8277: URL: https://github.com/apache/hudi/pull/8277#discussion_r1147573904 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java: ## @@ -168,8 +170,14 @@ protected ClosableIterator> deserializeRecords(byte[] conten // Get schema from the header Schema writerSchema = new Schema.Parser().parse(super.getLogBlockHeader().get(HeaderMetadataType.SCHEMA)); +HoodieLogBlockContentLocation blockContentLoc = getBlockContentLocation().get(); +Configuration inlineConf = new Configuration(blockContentLoc.getHadoopConf()); +inlineConf.set("fs." + InLineFileSystem.SCHEME + ".impl", InLineFileSystem.class.getName()); +inlineConf.setClassLoader(InLineFileSystem.class.getClassLoader()); + +FileSystem fs = FSUtils.getFs(pathForReader.toString(), inlineConf); Review Comment: Sounds good @Zouxxyy , as along as we are ok with compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5979) Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector
[ https://issues.apache.org/jira/browse/HUDI-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5979: - Labels: pull-request-available (was: ) > Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector > > > Key: HUDI-5979 > URL: https://issues.apache.org/jira/browse/HUDI-5979 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.13.1 > > > Follow up to https://issues.apache.org/jira/browse/HUDI-3097 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] codope opened a new pull request, #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector
codope opened a new pull request, #8285: URL: https://github.com/apache/hudi/pull/8285 ### Change Logs Add `hudi-client-common` and `hudi-java-client` to `hudi-trino-bundle`. Trino-Hudi connector makes use of the write client for some tests. Eventually, we want to add write capabilities as well. Have tested the bundle in [Trino](https://github.com/codope/trino/blob/upgrade-hudi-0.13.0/plugin/trino-hudi/pom.xml). ### Impact The bundle size grows by 1mb from 39.5 to 40.5mb. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #8272: use path similar to base file when config is true
jonvex commented on code in PR #8272: URL: https://github.com/apache/hudi/pull/8272#discussion_r1147558900 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -332,8 +333,13 @@ private HoodieData> readRecordsForGroupBaseFiles(JavaSparkContex List>> iteratorsForPartition = new ArrayList<>(); clusteringOpsPartition.forEachRemaining(clusteringOp -> { try { + boolean isBootstrapSkeleton = !clusteringOp.getBootstrapFilePath().isEmpty(); Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema())); HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath())); + if (isBootstrapSkeleton) { Review Comment: Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482770247 ## CI report: * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893) * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482770134 ## CI report: * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #8272: use path similar to base file when config is true
codope commented on code in PR #8272: URL: https://github.com/apache/hudi/pull/8272#discussion_r1147541565 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -332,8 +333,13 @@ private HoodieData> readRecordsForGroupBaseFiles(JavaSparkContex List>> iteratorsForPartition = new ArrayList<>(); clusteringOpsPartition.forEachRemaining(clusteringOp -> { try { + boolean isBootstrapSkeleton = !clusteringOp.getBootstrapFilePath().isEmpty(); Schema readerSchema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(writeConfig.getSchema())); HoodieFileReader baseFileReader = HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(), new Path(clusteringOp.getDataFilePath())); + if (isBootstrapSkeleton) { Review Comment: So, for full bootstrap mode, it still goes through the usual base file reader correct? ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBootstrapFileReader.java: ## @@ -0,0 +1,88 @@ +package org.apache.hudi.io.storage; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.bloom.BloomFilter; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.MetadataValues; +import org.apache.hudi.common.util.collection.ClosableIterator; + +import org.apache.avro.Schema; + +import java.io.IOException; +import java.util.Set; + + +public class HoodieBootstrapFileReader implements HoodieFileReader { + + private HoodieFileReader skeletonFileReader; + private HoodieFileReader dataFileReader; + private Boolean isConsistentLogicalTimestampEnabled; + + public HoodieBootstrapFileReader(HoodieFileReader skeletonFileReader, HoodieFileReader dataFileReader, Boolean isConsistentLogicalTimestampEnabled) { +this.skeletonFileReader = skeletonFileReader; +this.dataFileReader = dataFileReader; +this.isConsistentLogicalTimestampEnabled = isConsistentLogicalTimestampEnabled; + } + @Override + public String[] readMinMaxRecordKeys() { +return skeletonFileReader.readMinMaxRecordKeys(); + } + + @Override + public BloomFilter readBloomFilter() { +return skeletonFileReader.readBloomFilter(); + } + + @Override + public Set filterRowKeys(Set candidateRowKeys) { +return skeletonFileReader.filterRowKeys(candidateRowKeys); + } + + @Override + public ClosableIterator> getRecordIterator(Schema readerSchema, Schema requestedSchema) throws IOException { +ClosableIterator> skeletonIterator = skeletonFileReader.getRecordIterator(readerSchema, requestedSchema); +ClosableIterator> dataFileIterator = dataFileReader.getRecordIterator(HoodieAvroUtils.removeMetadataFields(readerSchema), requestedSchema); + +return new ClosableIterator>() { + @Override + public void close() { + skeletonIterator.close(); + dataFileIterator.close(); + } + + @Override + public boolean hasNext() { +return skeletonIterator.hasNext() && dataFileIterator.hasNext(); + } + + @Override + public HoodieRecord next() { +HoodieRecord dataRecord = dataFileIterator.next(); +HoodieRecord skeletonRecord = skeletonIterator.next(); +HoodieRecord ret = dataRecord.prependMetaFields(readerSchema, readerSchema, new MetadataValues(). +setCommitTime(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_TIME_METADATA_FIELD )) +.setCommitSeqno(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.COMMIT_SEQNO_METADATA_FIELD)) +.setRecordKey(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.RECORD_KEY_METADATA_FIELD)) +.setPartitionPath(skeletonRecord.getRecordKey(readerSchema, HoodieRecord.PARTITION_PATH_METADATA_FIELD)) Review Comment: Is the skeleton record giving the partition path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to insert overwrite data, the table will be cleared first, and then the corresponding pa
KnightChess commented on issue #8283: URL: https://github.com/apache/hudi/issues/8283#issuecomment-1482748957 @nsivabalan @yihua @XuQianJin-Stars @weimingdiit I think this need remind in doc or add check in 0.13.1, what about you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] KnightChess commented on issue #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to insert overwrite data, the table will be cleared first, and then the corresponding pa
KnightChess commented on issue #8283: URL: https://github.com/apache/hudi/issues/8283#issuecomment-1482745990 #7365 look like this pr change the dynamic action. Before it, hudi's overwrite is always dynamic, and I check the doc in `https://hudi.apache.org/releases/release-0.13.0` didn't remind it. It will cause serious data problems if upgrade to 0.13.0, user will delete all data by mistake. May be hudi need use some config to make user know this action or limit cover the whole table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5979) Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector
Sagar Sumit created HUDI-5979: - Summary: Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector Key: HUDI-5979 URL: https://issues.apache.org/jira/browse/HUDI-5979 Project: Apache Hudi Issue Type: Improvement Reporter: Sagar Sumit Assignee: Sagar Sumit Fix For: 0.13.1 Follow up to https://issues.apache.org/jira/browse/HUDI-3097 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482721544 ## CI report: * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893) * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …
hudi-bot commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482712698 ## CI report: * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on issue #8261: [SUPPORT] How to reduce hoodie commit latency
kazdy commented on issue #8261: URL: https://github.com/apache/hudi/issues/8261#issuecomment-1482653969 Aws emr team provided me with patched hudi 0.12.1 jar, you can ask aws support for it and instructions how to provide it to the cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
hudi-bot commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482647182 ## CI report: * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8227: [HUDI-5952] Fix NPE when use kafka callback
hudi-bot commented on PR #8227: URL: https://github.com/apache/hudi/pull/8227#issuecomment-1482646921 ## CI report: * 4cc7ab6ab87a640bcb68c97c55f642fde9ed5ecc UNKNOWN * 36abd1831338c963296e82e37d502db0deb5cc3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15891) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
hudi-bot commented on PR #8284: URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482638115 ## CI report: * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep
hudi-bot commented on PR #8231: URL: https://github.com/apache/hudi/pull/8231#issuecomment-1482626905 ## CI report: * f59475005e6bfd827761e39f44cfca547654f1ff Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15889) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
[ https://issues.apache.org/jira/browse/HUDI-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5978: - Labels: pull-request-available (was: ) > spark timeline timezone is not updated when hoodie.table.timeline.timezone is > UTC > - > > Key: HUDI-5978 > URL: https://issues.apache.org/jira/browse/HUDI-5978 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: inki hwang >Priority: Minor > Labels: pull-request-available > > The commit timezone is not updated when HoodieSparkSqlWriter write method. > For example, the LOCAL time zone is KST (UTC+9), and even if > 'hoodie.table.timeline.timezone' is UTC, the first instance time is created > as LOCAL (KST) and then initTable is called. > Then, the second instant time after initTable is created in UTC and waits > because the first instant time is 9 hours ahead of the second KST. > And in other situations, a write method started when there is already an > initialized table does not call setCommitTimezone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] nicholas-fwang opened a new pull request, #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
nicholas-fwang opened a new pull request, #8284: URL: https://github.com/apache/hudi/pull/8284 ### Change Logs Create instant time after setCommitTimezone if table exists, or after initTable if no table exists. ### Impact When hoodie.table.timeline.timezone is UTC in not UTC LOCAL timezone, timeline action does not progress. ### Risk level (write none, low medium or high below) none ### Documentation Update The commit timezone is not updated when HoodieSparkSqlWriter write method. For example, the LOCAL time zone is KST (UTC+9), and even if 'hoodie.table.timeline.timezone' is UTC, the first instance time is created as LOCAL (KST) and then initTable is called. Then, the second instant time after initTable is created in UTC and waits because the first instant time is 9 hours ahead of the second KST. And in other situations, a write method started when there is already an initialized table does not call setCommitTimezone. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC
[ https://issues.apache.org/jira/browse/HUDI-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] inki hwang updated HUDI-5978: - Summary: spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC (was: spark timeline timezone is not updated when hoodie.table.timeline.timezone is not UTC) > spark timeline timezone is not updated when hoodie.table.timeline.timezone is > UTC > - > > Key: HUDI-5978 > URL: https://issues.apache.org/jira/browse/HUDI-5978 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: inki hwang >Priority: Minor > > The commit timezone is not updated when HoodieSparkSqlWriter write method. > For example, the LOCAL time zone is KST (UTC+9), and even if > 'hoodie.table.timeline.timezone' is UTC, the first instance time is created > as LOCAL (KST) and then initTable is called. > Then, the second instant time after initTable is created in UTC and waits > because the first instant time is 9 hours ahead of the second KST. > And in other situations, a write method started when there is already an > initialized table does not call setCommitTimezone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is not UTC
inki hwang created HUDI-5978: Summary: spark timeline timezone is not updated when hoodie.table.timeline.timezone is not UTC Key: HUDI-5978 URL: https://issues.apache.org/jira/browse/HUDI-5978 Project: Apache Hudi Issue Type: Bug Components: spark Reporter: inki hwang The commit timezone is not updated when HoodieSparkSqlWriter write method. For example, the LOCAL time zone is KST (UTC+9), and even if 'hoodie.table.timeline.timezone' is UTC, the first instance time is created as LOCAL (KST) and then initTable is called. Then, the second instant time after initTable is created in UTC and waits because the first instant time is 9 hours ahead of the second KST. And in other situations, a write method started when there is already an initialized table does not call setCommitTimezone. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] weimingdiit opened a new issue, #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to write data, the table will be cleared first, and then the corresponding partition da
weimingdiit opened a new issue, #8283: URL: https://github.com/apache/hudi/issues/8283 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] haripriyarhp commented on issue #8153: [SUPPORT] Async Clustering failing for MoR in 0.13.0
haripriyarhp commented on issue #8153: URL: https://github.com/apache/hudi/issues/8153#issuecomment-1482577064 @nsivabalan : Yes, async compaction is happening without any failures (though there is performance issues). But async clustering is not working. I even tried today by creating a new table and it throws the same error as above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org