[GitHub] [hudi] codope commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


codope commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1483741022

   Yeah the build succeeded locally for me for Spark 3.3. I just pushed a 
commit after rebasing with master. Let's see.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1483740825

   
   ## CI report:
   
   * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897)
 
   * 5b31410a0bb28ebb16d0af88f8c45662b6b8fd92 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483737519

   
   ## CI report:
   
   * 90db3447a020728c0fc12b3714cd018482b89d1d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15912)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7847: [HUDI-5697] Revisiting refreshing of Hudi relations after write operations on the tables

2023-03-24 Thread via GitHub


codope commented on code in PR #7847:
URL: https://github.com/apache/hudi/pull/7847#discussion_r1148306772


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalogUtils.scala:
##
@@ -17,8 +17,76 @@
 
 package org.apache.spark.sql
 
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.catalyst.{QualifiedTableName, TableIdentifier}
+
 /**
  * NOTE: Since support for [[TableCatalog]] was only added in Spark 3, this 
trait
  *   is going to be an empty one simply serving as a placeholder (for 
compatibility w/ Spark 2)
  */
 trait HoodieCatalogUtils {}
+
+object HoodieCatalogUtils {
+
+  /**
+   * Please check scala-doc for other overloaded [[refreshTable()]] operation
+   */
+  def refreshTable(spark: SparkSession, qualifiedTableName: String): Unit = {
+val tableId = 
spark.sessionState.sqlParser.parseTableIdentifier(qualifiedTableName)
+refreshTable(spark, tableId)
+  }
+
+  /**
+   * Refreshes metadata and flushes cached data (resolved [[LogicalPlan]] 
representation,
+   * already loaded [[InMemoryRelation]]) for the table identified by 
[[tableId]].
+   *
+   * This method is usually invoked at the end of the write operation to make 
sure cached
+   * data/metadata are synchronized with the state on storage.
+   *
+   * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING
+   *   This is borrowed from Spark 3.1.3 and modified to satisfy Hudi 
needs:
+   *  - Unlike Spark canonical implementation, in case of Hudi this 
method is invoked
+   *after writes carried out via Spark DataSource integration as 
well and as such
+   *in these cases data might actually be missing from the caches, 
therefore
+   *actually re-triggering resolution phase (involving 
file-listing, etc) for the
+   *first time
+   *  - Additionally, this method is modified to avoid refreshing 
[[LogicalRelation]]
+   *completely to make sure that we're not re-triggering the 
file-listing of the
+   *table, immediately after it's been written, instead deferring 
it to subsequent
+   *read operation
+   */
+  def refreshTable(spark: SparkSession, tableId: TableIdentifier): Unit = {
+val sessionCatalog = spark.sessionState.catalog
+val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableId)
+
+// Before proceeding we validate whether this table is actually cached 
w/in [[SessionCatalog]],
+// since, for ex, in case of writing via Spark DataSource (V1) API, Spark 
wouldn't actually
+// resort to caching the data
+val cachedPlan = sessionCatalog.getCachedTable(
+  QualifiedTableName(tableId.database.getOrElse(tableMetadata.database), 
tableId.identifier))
+
+if (cachedPlan != null) {
+  // NOTE: Provided that this table is still cached, following operation 
would not be
+  //   triggering subsequent resolution and listing of the table
+  val table = spark.table(tableId)
+
+  if (tableMetadata.tableType == CatalogTableType.VIEW) {

Review Comment:
   I concur with @YannByron comment.
   @alexeykudinkin What do you think? IMO, not invalidating the relation cache 
could actually help performance slightly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] duc-dn commented on issue #8273: [SUPPORT] How to connect Hudi cli to MinIO

2023-03-24 Thread via GitHub


duc-dn commented on issue #8273:
URL: https://github.com/apache/hudi/issues/8273#issuecomment-1483723081

   @umehrot2 can you help me, please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

2023-03-24 Thread via GitHub


wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1148304318


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, 
SessionCatalog}
+import 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig
+
+import scala.reflect.ClassTag
+
+abstract class TestOperationConfig extends HoodieSparkSqlTestBase {
+  val catalog: SessionCatalog = spark.sessionState.catalog
+

Review Comment:
   This abstract class aims to be inherited to test other operation like 
`buildHoodieDeleteTableConfig`, `buildHoodieDropPartitionsConfig` etc, a 
companion object is more reasonable if we do not need test other operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

2023-03-24 Thread via GitHub


danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1148280602


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, 
SessionCatalog}
+import 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig
+
+import scala.reflect.ClassTag
+
+abstract class TestOperationConfig extends HoodieSparkSqlTestBase {
+  val catalog: SessionCatalog = spark.sessionState.catalog
+

Review Comment:
   No need to make the clazz abstract, if you wanna to define some common 
utilities methods, just declare a companion object class with the same name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-24 Thread via GitHub


KnightChess commented on PR #8275:
URL: https://github.com/apache/hudi/pull/8275#issuecomment-1483714107

   @nsivabalan use `writeStats` will not trigger clustering dag too, I think it 
has no result gap if use it
   https://user-images.githubusercontent.com/20125927/227693933-22fa67f4-1d34-4b25-bf0b-83bfac2b21f5.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-03-24 Thread via GitHub


danny0405 commented on issue #8267:
URL: https://github.com/apache/hudi/issues/8267#issuecomment-1483713136

   1. the `--service` param has no value, it is a non-valued param
   2. the path is null: I guess the path means base files to compact, in your 
use case, there are no parquets but all logs to compact
   3. the `.inflight` file is just a marker file to indicate that the 
transaction of current instant is on-going.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #8274: [SUPPORT] Append Mode should support close the bloom filter option

2023-03-24 Thread via GitHub


danny0405 commented on issue #8274:
URL: https://github.com/apache/hudi/issues/8274#issuecomment-1483712177

   > a certain impact on write throughput
   
   I'm confused why turning off the BF increased the write throughput.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] 15663671003 commented on issue #8287: [SUPPORT]

2023-03-24 Thread via GitHub


15663671003 commented on issue #8287:
URL: https://github.com/apache/hudi/issues/8287#issuecomment-1483709822

   I want to maintain an incremental read program when valud=true or 
update_time != record_time, the record is taken out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483702560

   
   ## CI report:
   
   * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910)
 
   * 90db3447a020728c0fc12b3714cd018482b89d1d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15912)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483700457

   
   ## CI report:
   
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901)
 
   * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910)
 
   * 90db3447a020728c0fc12b3714cd018482b89d1d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483698416

   
   ## CI report:
   
   * a5482d72c24d9c8b5b4c30609e2efda16737994a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15911)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


voonhous commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483686787

   Not sure why the CI is failing
   
   ```
   TestSpark3DDL:
   - Test multi change data type
   - Test multi change data type2
   - Test Enable and Disable Schema on read
   - Test Partition Table alter 
   - Test Chinese table 
   - Test Alter Table
   - Test Alter Table multiple times
   - Test Alter Table complex
   - Test schema auto evolution complex
   - Test schema auto evolution
   3157022 [ScalaTest-main-running-TestSpark3DDL] WARN  
org.apache.hudi.metadata.HoodieBackedTableMetadata [] - Metadata table was not 
found at path 
file:/tmp/spark-f80c21e6-ac26-4762-8e46-24b7b8984f60/h31/.hoodie/metadata
   - Test DATE to STRING conversions when vectorized reading is not enabled *** 
FAILED ***
 org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative 
at input 'alter table h31 alter'(line 1, pos 16)
   
   == SQL ==
   alter table h31 alter column `date_to_string_col` type string
   ^^^
 at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
 at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
 at 
org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:45)
 at 
org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:42)
 at 
org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parse(HoodieSpark2ExtendedSqlParser.scala:80)
 at 
org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parsePlan(HoodieSpark2ExtendedSqlParser.scala:42)
 at 
org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:43)
 at 
org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:40)
 ...
   
   [INFO] hudi-spark-datasource .. SUCCESS [  3.232 
s]
   [INFO] hudi-spark-common_2.11 . SUCCESS [ 21.403 
s]
   [INFO] hudi-spark2_2.11 ... SUCCESS [01:00 
min]
   [INFO] hudi-spark_2.11  FAILURE [  02:03 
h]
   [INFO] hudi-spark2-common . SUCCESS [  0.028 
s]
   ```
   
   Just curious why `TestSpark3DDL` is being executed on a target that is 
running Spark2.4. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483685344

   
   ## CI report:
   
   * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908)
 
   * a5482d72c24d9c8b5b4c30609e2efda16737994a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15911)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483685309

   
   ## CI report:
   
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901)
 
   * 7dfa9a09b36fe2f9af728365843cd89877994cb9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15910)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xiarixiaoyao commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


xiarixiaoyao commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483683848

   @voonhous 
   Thank you for your contribution, LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483682538

   
   ## CI report:
   
   * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908)
 
   * a5482d72c24d9c8b5b4c30609e2efda16737994a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483682518

   
   ## CI report:
   
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901)
 
   * 7dfa9a09b36fe2f9af728365843cd89877994cb9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8289:
URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483680644

   
   ## CI report:
   
   * bedea321750519d71acc0850e2c6b9d1c37132e9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15909)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8289:
URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483625455

   
   ## CI report:
   
   * bedea321750519d71acc0850e2c6b9d1c37132e9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15909)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8289:
URL: https://github.com/apache/hudi/pull/8289#issuecomment-1483618869

   
   ## CI report:
   
   * bedea321750519d71acc0850e2c6b9d1c37132e9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex opened a new pull request, #8289: [HUDI-5891] fix clustering on bootstrap tables. Row-writer disabled does not work…

2023-03-24 Thread via GitHub


jonvex opened a new pull request, #8289:
URL: https://github.com/apache/hudi/pull/8289

   … for spark2.4
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483612636

   
   ## CI report:
   
   * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15908)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483571586

   
   ## CI report:
   
   * 810154088a84751c752caef67a0a2759628c5209 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15907)
 
   * 6cc4481cef5b4548b5688046f6ad5b03bdff9a3f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483565242

   
   ## CI report:
   
   * 810154088a84751c752caef67a0a2759628c5209 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15907)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


nsivabalan commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483564763

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483486772

   
   ## CI report:
   
   * 810154088a84751c752caef67a0a2759628c5209 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15906)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8187:
URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483486432

   
   ## CI report:
   
   * 052478e775ef24c47336971ce58392fe29e7ac45 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483424319

   
   ## CI report:
   
   * f1187da731158ce7463332240971771c54fd9f41 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905)
 
   * 810154088a84751c752caef67a0a2759628c5209 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483404763

   
   ## CI report:
   
   * f1187da731158ce7463332240971771c54fd9f41 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483365421

   
   ## CI report:
   
   * f1187da731158ce7463332240971771c54fd9f41 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15904)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483356312

   
   ## CI report:
   
   * f1187da731158ce7463332240971771c54fd9f41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483348699

   
   ## CI report:
   
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8272:
URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483348605

   
   ## CI report:
   
   * 076980c169ad316345ce81e4c96d3eb4387a51d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15903)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


nsivabalan commented on PR #8288:
URL: https://github.com/apache/hudi/pull/8288#issuecomment-1483331927

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan opened a new pull request, #8288: [HUDI-5975] Release 0.12.3 prep triage flaky test

2023-03-24 Thread via GitHub


nsivabalan opened a new pull request, #8288:
URL: https://github.com/apache/hudi/pull/8288

   ### Change Logs
   
   Release 0.12.3 prep triage flaky test
   
   ### Impact
   
   Release 0.12.3 prep triage flaky test
   
   ### Risk level (write none, low medium or high below)
   
   low.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8272:
URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483303489

   
   ## CI report:
   
   * 757ff2448d8e19b19be316803fd29a9c89a747bb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15881)
 
   * 076980c169ad316345ce81e4c96d3eb4387a51d1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15903)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8187:
URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483303167

   
   ## CI report:
   
   * 39a14a897c8574f1760d538056b6985344d6eb9a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15721)
 
   * 052478e775ef24c47336971ce58392fe29e7ac45 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8187: Upgrade aws java sdk to v2

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8187:
URL: https://github.com/apache/hudi/pull/8187#issuecomment-1483294861

   
   ## CI report:
   
   * 39a14a897c8574f1760d538056b6985344d6eb9a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15721)
 
   * 052478e775ef24c47336971ce58392fe29e7ac45 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8272:
URL: https://github.com/apache/hudi/pull/8272#issuecomment-1483295204

   
   ## CI report:
   
   * 757ff2448d8e19b19be316803fd29a9c89a747bb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15881)
 
   * 076980c169ad316345ce81e4c96d3eb4387a51d1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5981) Upgrade aws java sdk v2

2023-03-24 Thread Rahil Chertara (Jira)
Rahil Chertara created HUDI-5981:


 Summary: Upgrade aws java sdk v2
 Key: HUDI-5981
 URL: https://issues.apache.org/jira/browse/HUDI-5981
 Project: Apache Hudi
  Issue Type: Task
Reporter: Rahil Chertara






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8277:
URL: https://github.com/apache/hudi/pull/8277#issuecomment-1483241834

   
   ## CI report:
   
   * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8285:
URL: https://github.com/apache/hudi/pull/8285#issuecomment-1483211187

   
   ## CI report:
   
   * a459c3d46e7357e0d921e562a2fc79c98b69e7dc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15899)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (6916803f7a4 -> 41026ef1fea)

2023-03-24 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6916803f7a4 [HUDI-5941] Support savepoint call procedure with base 
path in Spark SQL (#8271)
 add 41026ef1fea [HUDI-5289] Avoiding repeated trigger of clustering dag 
(#8275)

No new revisions were added by this update.

Summary of changes:
 .../action/commit/BaseCommitActionExecutor.java|  4 ++
 .../apache/hudi/functional/TestCOWDataSource.scala | 53 ++
 2 files changed, 57 insertions(+)



[GitHub] [hudi] nsivabalan merged pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-24 Thread via GitHub


nsivabalan merged PR #8275:
URL: https://github.com/apache/hudi/pull/8275


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-24 Thread via GitHub


nsivabalan commented on PR #8275:
URL: https://github.com/apache/hudi/pull/8275#issuecomment-1483190292

   hey @KnightChess : 
   not sure whats your suggestion.
   We do already check isEmpty in SparkRddWriteClient.
   
   ```
 private void 
validateClusteringCommit(HoodieWriteMetadata> 
clusteringMetadata, String clusteringCommitTime, HoodieTable table) {
   if (clusteringMetadata.getWriteStatuses().isEmpty()) {
 HoodieClusteringPlan clusteringPlan = 
ClusteringUtils.getClusteringPlan(
 table.getMetaClient(), 
HoodieTimeline.getReplaceCommitRequestedInstant(clusteringCommitTime))
 .map(Pair::getRight).orElseThrow(() -> new 
HoodieClusteringException(
 "Unable to read clustering plan for instant: " + 
clusteringCommitTime));
 throw new HoodieClusteringException("Clustering plan produced 0 
WriteStatus for " + clusteringCommitTime
 + " #groups: " + clusteringPlan.getInputGroups().size() + " 
expected at least "
 + 
clusteringPlan.getInputGroups().stream().mapToInt(HoodieClusteringGroup::getNumOutputFileGroups).sum()
 + " write statuses");
   }
 }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483159402

   
   ## CI report:
   
   * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898)
 
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483149756

   
   ## CI report:
   
   * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898)
 
   * 2b273f906891d2e4e9fea23c148eb524ae1c667e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5980) Add tests to guard against repeated dag trigger using spark event listeners

2023-03-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5980:
-

 Summary: Add tests to guard against repeated dag trigger using 
spark event listeners
 Key: HUDI-5980
 URL: https://issues.apache.org/jira/browse/HUDI-5980
 Project: Apache Hudi
  Issue Type: Improvement
  Components: tests-ci
Reporter: sivabalan narayanan


as of now, we don't have a good way to guard repeated dag trigger. all of our 
existing tests only checks for data. but w/ reconcile strategy, the extra files 
will be removed if some dag was repeated. So, we might need to add more tests 
to catch them if incase someone changes the dag in future. 

 

Eg test: [https://github.com/apache/hudi/pull/8275/files]

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-24 Thread via GitHub


nsivabalan commented on code in PR #8275:
URL: https://github.com/apache/hudi/pull/8275#discussion_r1147858885


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java:
##
@@ -255,6 +257,8 @@ protected HoodieWriteMetadata> 
executeClustering(HoodieC
 .performClustering(clusteringPlan, schema, instantTime);
 HoodieData writeStatusList = writeMetadata.getWriteStatuses();
 HoodieData statuses = updateIndex(writeStatusList, 
writeMetadata);
+statuses.persist(config.getString(WRITE_STATUS_STORAGE_LEVEL_VALUE), 
context, HoodieData.HoodieDataCacheKey.of(config.getBasePath(), instantTime));

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-5980



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1483136033

   
   ## CI report:
   
   * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


deepikaeswar95 commented on issue #8286:
URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483067623

   Common config file is shared via slack hudi community @soumilshah1995 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


deepikaeswar95 commented on issue #8286:
URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483065843

   @soumilshah1995 attaching the latest hudi specific logs enabled and run in 
bulk insert mode
   [Hudi logs -bulk 
insert.txt](https://github.com/apache/hudi/files/11064258/Hudi.logs.-bulk.insert.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


deepikaeswar95 commented on issue #8286:
URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483064629

   @soumilshah1995 , I have tried partition by partition. We have data from 21 
st feb to till date . When we load partition by partition using upsert, the 
data is loaded perfectly, but when the delta streamer is run in continuous mode 
(upsert) it fails.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] 15663671003 opened a new issue, #8287: [SUPPORT]

2023-03-24 Thread via GitHub


15663671003 opened a new issue, #8287:
URL: https://github.com/apache/hudi/issues/8287

   I implemented a custom payload based on HoudieRecordPayload, but there were 
problems. When I use incremental queries, record_ "Time is the value of the 
incremental payload (incorrect). When running a snapshot query, record"_ Time 
is an old value (correct), which does not meet my expectations. Does the 
payload obtained by incremental queries differ from the snapshot query results? 
Please help me
   ```java
/* Omitted content */
   
   public class CustomPayload extends OverwriteWithLatestAvroPayload {
 /* Omitted content */
   
 @Override
 public Option combineAndGetUpdateValue(IndexedRecord 
currentValue, Schema schema, Properties properties) throws IOException {
   if (recordBytes.length == 0) {
 return Option.empty();
   }
   
   GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, 
schema);
   if (!needUpdatingPersistedRecord(currentValue, incomingRecord, 
properties)) {
 return Option.of(currentValue);
   }
   
   /*custom code*/
   if (((GenericRecord) currentValue).get("record_time") != null) {
 incomingRecord.put("record_time", ((GenericRecord) 
currentValue).get("record_time"));
   }
   
   eventTime = updateEventTime(incomingRecord, properties);
   return isDeleteRecord(incomingRecord) ? Option.empty() : 
Option.of(incomingRecord);
 }
   
/* Omitted content */
   
 protected boolean needUpdatingPersistedRecord(IndexedRecord currentValue,
   IndexedRecord 
incomingRecord, Properties properties) {
   
/* Omitted content */
   
   return (((Comparable) 
persistedOrderingVal).compareTo(incomingOrderingVal) < 0) && (
   ((GenericRecord) currentValue).get("valid").equals(true) || 
((GenericRecord) incomingRecord).get("valid").equals(true)) && (
   ((GenericRecord) currentValue).get("content_md5") == null
   || !((GenericRecord) 
currentValue).get("content_md5").equals(((GenericRecord) 
incomingRecord).get("content_md5"))
   );
 }
   
   }
   
   ```
   **Expected behavior**
   ```shell
   >>> 
spark.read.format("hudi").load("*").filter("*").show(truncate=False)
   
+---++
   
+--++
 
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key 
 
   |_hoodie_partition_path|_hoodie_file_name
   
   |valid|update_time|record_time|content_md5   
  
   |id  
|
   
+---++
   
+--++
 
   |20230324152704225  
|20230324152704225_13_2563860|7306da3dd0c41ff504447981c4e850949db69524154c0c5bf85e62758babf3cc
   |  |0013-ea30-4f9d-9704-e4f82fceb940-0   
   |
   false|2023-03-24 14:48:09|2023-03-01 
21:40:42|df3c2a9f8eaf5b8eec26b363cc67003f
   |7306da3dd0c41ff504447981c4e850949db69524154c0c5bf85e62758babf3cc|
   
   
   
   
   >>> df = 
spark.read.format("hudi").options(**{'hoodie.datasource.query.type': 
"incremental", "hoodie.datasource.read.begin.instanttime": 
'20230324151944584'}).load("**")
   >>> 
spark.read.format("hudi").load("*").filter("*").show(truncate=False)
   
+---++
   
+--++
 
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key 
 
   |_hoodie_partition_path|_hoodie_file_name
   
   |valid|update_time|record_time|content_md5   
  
   |id  
|
   
+---++
   
+--++
 
   |20230324152704225  
|20230324152704225_13_2563860|7306da3dd0c41ff50444

[GitHub] [hudi] soumilshah1995 commented on issue #8260: [SUPPORT] How to implement incremental join

2023-03-24 Thread via GitHub


soumilshah1995 commented on issue #8260:
URL: https://github.com/apache/hudi/issues/8260#issuecomment-1483022972

   what do you think does this solve your issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


soumilshah1995 commented on issue #8286:
URL: https://github.com/apache/hudi/issues/8286#issuecomment-1483021653

   i think its making to many calls to S3 
   can yoiu try going or reading data partition by partition ? 
   can you share configs 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482984410

   
   ## CI report:
   
   * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482952270

   
   ## CI report:
   
   * 3fed80ecdcfabf29904025fa84e2d08505351189 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stathismar commented on issue #8278: [SUPPORT] Deltastreamer Fails with AWSDmsAvroPayload

2023-03-24 Thread via GitHub


stathismar commented on issue #8278:
URL: https://github.com/apache/hudi/issues/8278#issuecomment-1482931370

   **Update:**
   I tried to submit the DeltaStreamer Spark Job to a local Minikube cluster 
and the issue gone away. I'm not sure what the problem was exactly. Most 
probably it has to do with incompatibility between my system's Java version and 
Spark 3.3.1/Hudi. In order to run Spark with Hudi on K8s, I used the official 
Spark image and everything worked as expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nicholas-fwang commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


nicholas-fwang commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482900107

   @codope thanks for review.
   I'm trying to find a violation for failure of 
https://github.com/apache/hudi/actions/runs/4510471335/jobs/7943933966?pr=8284
   but I couldn't find them in my build, just success. could you know what's 
checkstyle violation in this PR?
   thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8277:
URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482889660

   
   ## CI report:
   
   * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894)
 
   * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8277:
URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482875309

   
   ## CI report:
   
   * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894)
 
   * 8071f3f4a10a0aa0f3e295985aebdc5ed176e31c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8285:
URL: https://github.com/apache/hudi/pull/8285#issuecomment-1482860942

   
   ## CI report:
   
   * a459c3d46e7357e0d921e562a2fc79c98b69e7dc Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15899)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8163: [HUDI-5921] Partition path should be considered in BucketIndexConcurrentFileWritesConflictResolutionStrategy

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8163:
URL: https://github.com/apache/hudi/pull/8163#issuecomment-1482859722

   
   ## CI report:
   
   * fa6a26972e75f23b195c24cd51619f6409b42c95 UNKNOWN
   * 7bb2b915a31acb05af30d6672fa755a4a1bc59ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15895)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope closed issue #7589: [Support] Keep only clustered file(all) after cleaning

2023-03-24 Thread via GitHub


codope closed issue #7589: [Support] Keep only clustered file(all) after 
cleaning
URL: https://github.com/apache/hudi/issues/7589


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope merged pull request #8271: [HUDI-5941] Support savepoint call procedure with base path in Spark SQL

2023-03-24 Thread via GitHub


codope merged PR #8271:
URL: https://github.com/apache/hudi/pull/8271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5941] Support savepoint call procedure with base path in Spark SQL (#8271)

2023-03-24 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6916803f7a4 [HUDI-5941] Support savepoint call procedure with base 
path in Spark SQL (#8271)
6916803f7a4 is described below

commit 6916803f7a40a4af57e7de1f927a4d5aa7025e32
Author: Y Ethan Guo 
AuthorDate: Fri Mar 24 06:45:58 2023 -0700

[HUDI-5941] Support savepoint call procedure with base path in Spark SQL 
(#8271)
---
 .../procedures/CreateSavepointProcedure.scala  |  8 ++-
 .../procedures/DeleteSavepointProcedure.scala  |  8 ++-
 .../procedures/RollbackToSavepointProcedure.scala  |  8 ++-
 .../procedures/ShowSavepointsProcedure.scala   |  6 +-
 .../hudi/procedure/TestSavepointsProcedure.scala   | 70 +++---
 5 files changed, 69 insertions(+), 31 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala
index e81b6f086a2..8a40cfb502d 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/CreateSavepointProcedure.scala
@@ -28,10 +28,11 @@ import java.util.function.Supplier
 
 class CreateSavepointProcedure extends BaseProcedure with ProcedureBuilder 
with Logging {
   private val PARAMETERS = Array[ProcedureParameter](
-ProcedureParameter.required(0, "table", DataTypes.StringType, None),
+ProcedureParameter.optional(0, "table", DataTypes.StringType, None),
 ProcedureParameter.required(1, "commit_time", DataTypes.StringType, None),
 ProcedureParameter.optional(2, "user", DataTypes.StringType, ""),
-ProcedureParameter.optional(3, "comments", DataTypes.StringType, "")
+ProcedureParameter.optional(3, "comments", DataTypes.StringType, ""),
+ProcedureParameter.optional(4, "path", DataTypes.StringType, None)
   )
 
   private val OUTPUT_TYPE = new StructType(Array[StructField](
@@ -46,11 +47,12 @@ class CreateSavepointProcedure extends BaseProcedure with 
ProcedureBuilder with
 super.checkArgs(PARAMETERS, args)
 
 val tableName = getArgValueOrDefault(args, PARAMETERS(0))
+val tablePath = getArgValueOrDefault(args, PARAMETERS(4))
 val commitTime = getArgValueOrDefault(args, 
PARAMETERS(1)).get.asInstanceOf[String]
 val user = getArgValueOrDefault(args, 
PARAMETERS(2)).get.asInstanceOf[String]
 val comments = getArgValueOrDefault(args, 
PARAMETERS(3)).get.asInstanceOf[String]
 
-val basePath: String = getBasePath(tableName)
+val basePath: String = getBasePath(tableName, tablePath)
 val metaClient = 
HoodieTableMetaClient.builder.setConf(jsc.hadoopConfiguration()).setBasePath(basePath).build
 
 val activeTimeline: HoodieActiveTimeline = metaClient.getActiveTimeline
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala
index 1cdd0638f1a..5d3b9b22285 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/DeleteSavepointProcedure.scala
@@ -28,8 +28,9 @@ import java.util.function.Supplier
 
 class DeleteSavepointProcedure extends BaseProcedure with ProcedureBuilder 
with Logging {
   private val PARAMETERS = Array[ProcedureParameter](
-ProcedureParameter.required(0, "table", DataTypes.StringType, None),
-ProcedureParameter.required(1, "instant_time", DataTypes.StringType, None)
+ProcedureParameter.optional(0, "table", DataTypes.StringType, None),
+ProcedureParameter.required(1, "instant_time", DataTypes.StringType, None),
+ProcedureParameter.optional(2, "path", DataTypes.StringType, None)
   )
 
   private val OUTPUT_TYPE = new StructType(Array[StructField](
@@ -44,9 +45,10 @@ class DeleteSavepointProcedure extends BaseProcedure with 
ProcedureBuilder with
 super.checkArgs(PARAMETERS, args)
 
 val tableName = getArgValueOrDefault(args, PARAMETERS(0))
+val tablePath = getArgValueOrDefault(args, PARAMETERS(2))
 val instantTime = getArgValueOrDefault(args, 
PARAMETERS(1)).get.asInstanceOf[String]
 
-val basePath: String = getBasePath(tableName)
+val basePath: String = getBasePath(tableName, tablePath)
 val metaClient = 
HoodieTableMetaClient.builder.setConf(jsc.hadoopConfiguration()).setBasePath(basePath).build
 
 val completedInstants = 
met

[GitHub] [hudi] codope commented on a diff in pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-24 Thread via GitHub


codope commented on code in PR #8275:
URL: https://github.com/apache/hudi/pull/8275#discussion_r1147590832


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java:
##
@@ -255,6 +257,8 @@ protected HoodieWriteMetadata> 
executeClustering(HoodieC
 .performClustering(clusteringPlan, schema, instantTime);
 HoodieData writeStatusList = writeMetadata.getWriteStatuses();
 HoodieData statuses = updateIndex(writeStatusList, 
writeMetadata);
+statuses.persist(config.getString(WRITE_STATUS_STORAGE_LEVEL_VALUE), 
context, HoodieData.HoodieDataCacheKey.of(config.getBasePath(), instantTime));

Review Comment:
   Good call and thanks for adding a test using `StageEventManager`. Could you 
create a JIRA to add more such tests. We need DAG tests to guard changes in DAG.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] deepikaeswar95 commented on issue #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


deepikaeswar95 commented on issue #8286:
URL: https://github.com/apache/hudi/issues/8286#issuecomment-1482804011

   The spark job fails when delta streamer is run in upsert continuous mode or 
bulk insert .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] deepikaeswar95 opened a new issue, #8286: [SUPPORT] Spark job failing when delta streamer run in Bulk insert / Upsert continuous mode

2023-03-24 Thread via GitHub


deepikaeswar95 opened a new issue, #8286:
URL: https://github.com/apache/hudi/issues/8286

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope closed issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException

2023-03-24 Thread via GitHub


codope closed issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is 
null,resulting in a NullPointerException
URL: https://github.com/apache/hudi/issues/8257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException

2023-03-24 Thread via GitHub


codope commented on issue #8257:
URL: https://github.com/apache/hudi/issues/8257#issuecomment-1482802157

   Closing the issue as we have a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


Zouxxyy commented on PR #8277:
URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482799152

   > @Zouxxyy Looks like there is a checkstyle violation. Can you please 
correct? 
https://github.com/apache/hudi/actions/runs/4509620758/jobs/7944047354?pr=8277
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8285:
URL: https://github.com/apache/hudi/pull/8285#issuecomment-1482793073

   
   ## CI report:
   
   * a459c3d46e7357e0d921e562a2fc79c98b69e7dc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


codope commented on code in PR #8277:
URL: https://github.com/apache/hudi/pull/8277#discussion_r1147573904


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java:
##
@@ -168,8 +170,14 @@ protected  ClosableIterator> 
deserializeRecords(byte[] conten
 // Get schema from the header
 Schema writerSchema = new 
Schema.Parser().parse(super.getLogBlockHeader().get(HeaderMetadataType.SCHEMA));
 
+HoodieLogBlockContentLocation blockContentLoc = 
getBlockContentLocation().get();
+Configuration inlineConf = new 
Configuration(blockContentLoc.getHadoopConf());
+inlineConf.set("fs." + InLineFileSystem.SCHEME + ".impl", 
InLineFileSystem.class.getName());
+inlineConf.setClassLoader(InLineFileSystem.class.getClassLoader());
+
+FileSystem fs = FSUtils.getFs(pathForReader.toString(), inlineConf);

Review Comment:
   Sounds good @Zouxxyy , as along as we are ok with compatibility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5979) Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector

2023-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5979:
-
Labels: pull-request-available  (was: )

> Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector
> 
>
> Key: HUDI-5979
> URL: https://issues.apache.org/jira/browse/HUDI-5979
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> Follow up to https://issues.apache.org/jira/browse/HUDI-3097



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope opened a new pull request, #8285: [HUDI-5979] Add dependencies to hudi-trino-bundle needed for Trino connector

2023-03-24 Thread via GitHub


codope opened a new pull request, #8285:
URL: https://github.com/apache/hudi/pull/8285

   ### Change Logs
   
   Add `hudi-client-common` and `hudi-java-client` to `hudi-trino-bundle`. 
Trino-Hudi connector makes use of the write client for some tests. Eventually, 
we want to add write capabilities as well. Have tested the bundle in 
[Trino](https://github.com/codope/trino/blob/upgrade-hudi-0.13.0/plugin/trino-hudi/pom.xml).
   
   ### Impact
   
   The bundle size grows by 1mb from 39.5 to 40.5mb.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on a diff in pull request #8272: use path similar to base file when config is true

2023-03-24 Thread via GitHub


jonvex commented on code in PR #8272:
URL: https://github.com/apache/hudi/pull/8272#discussion_r1147558900


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -332,8 +333,13 @@ private HoodieData> 
readRecordsForGroupBaseFiles(JavaSparkContex
   List>> iteratorsForPartition = new 
ArrayList<>();
   clusteringOpsPartition.forEachRemaining(clusteringOp -> {
 try {
+  boolean isBootstrapSkeleton = 
!clusteringOp.getBootstrapFilePath().isEmpty();
   Schema readerSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(writeConfig.getSchema()));
   HoodieFileReader baseFileReader = 
HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(),
 new Path(clusteringOp.getDataFilePath()));
+  if (isBootstrapSkeleton) {

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482770247

   
   ## CI report:
   
   * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893)
 
   * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15898)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8277:
URL: https://github.com/apache/hudi/pull/8277#issuecomment-1482770134

   
   ## CI report:
   
   * cfc853da93c08d1317c9de4a7c4116ddc89e8344 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15894)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8272: use path similar to base file when config is true

2023-03-24 Thread via GitHub


codope commented on code in PR #8272:
URL: https://github.com/apache/hudi/pull/8272#discussion_r1147541565


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -332,8 +333,13 @@ private HoodieData> 
readRecordsForGroupBaseFiles(JavaSparkContex
   List>> iteratorsForPartition = new 
ArrayList<>();
   clusteringOpsPartition.forEachRemaining(clusteringOp -> {
 try {
+  boolean isBootstrapSkeleton = 
!clusteringOp.getBootstrapFilePath().isEmpty();
   Schema readerSchema = HoodieAvroUtils.addMetadataFields(new 
Schema.Parser().parse(writeConfig.getSchema()));
   HoodieFileReader baseFileReader = 
HoodieFileReaderFactory.getReaderFactory(recordType).getFileReader(hadoopConf.get(),
 new Path(clusteringOp.getDataFilePath()));
+  if (isBootstrapSkeleton) {

Review Comment:
   So, for full bootstrap mode, it still goes through the usual base file 
reader correct?



##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBootstrapFileReader.java:
##
@@ -0,0 +1,88 @@
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.bloom.BloomFilter;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.MetadataValues;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+
+import org.apache.avro.Schema;
+
+import java.io.IOException;
+import java.util.Set;
+
+
+public class HoodieBootstrapFileReader implements HoodieFileReader {
+
+  private HoodieFileReader skeletonFileReader;
+  private HoodieFileReader dataFileReader;
+  private Boolean isConsistentLogicalTimestampEnabled;
+
+  public HoodieBootstrapFileReader(HoodieFileReader skeletonFileReader, 
HoodieFileReader dataFileReader, Boolean 
isConsistentLogicalTimestampEnabled) {
+this.skeletonFileReader = skeletonFileReader;
+this.dataFileReader = dataFileReader;
+this.isConsistentLogicalTimestampEnabled = 
isConsistentLogicalTimestampEnabled;
+  }
+  @Override
+  public String[] readMinMaxRecordKeys() {
+return skeletonFileReader.readMinMaxRecordKeys();
+  }
+
+  @Override
+  public BloomFilter readBloomFilter() {
+return skeletonFileReader.readBloomFilter();
+  }
+
+  @Override
+  public Set filterRowKeys(Set candidateRowKeys) {
+return skeletonFileReader.filterRowKeys(candidateRowKeys);
+  }
+
+  @Override
+  public ClosableIterator> getRecordIterator(Schema 
readerSchema, Schema requestedSchema) throws IOException {
+ClosableIterator> skeletonIterator = 
skeletonFileReader.getRecordIterator(readerSchema, requestedSchema);
+ClosableIterator> dataFileIterator = 
dataFileReader.getRecordIterator(HoodieAvroUtils.removeMetadataFields(readerSchema),
 requestedSchema);
+
+return new ClosableIterator>() {
+  @Override
+  public void close() {
+  skeletonIterator.close();
+  dataFileIterator.close();
+  }
+
+  @Override
+  public boolean hasNext() {
+return skeletonIterator.hasNext() && dataFileIterator.hasNext();
+  }
+
+  @Override
+  public HoodieRecord next() {
+HoodieRecord dataRecord = dataFileIterator.next();
+HoodieRecord skeletonRecord = skeletonIterator.next();
+HoodieRecord ret = dataRecord.prependMetaFields(readerSchema, 
readerSchema, new MetadataValues().
+setCommitTime(skeletonRecord.getRecordKey(readerSchema, 
HoodieRecord.COMMIT_TIME_METADATA_FIELD ))
+.setCommitSeqno(skeletonRecord.getRecordKey(readerSchema, 
HoodieRecord.COMMIT_SEQNO_METADATA_FIELD))
+.setRecordKey(skeletonRecord.getRecordKey(readerSchema, 
HoodieRecord.RECORD_KEY_METADATA_FIELD))
+.setPartitionPath(skeletonRecord.getRecordKey(readerSchema, 
HoodieRecord.PARTITION_PATH_METADATA_FIELD))

Review Comment:
   Is the skeleton record giving the partition path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on issue #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to insert overwrite data, the table will be cleared first, and then the corresponding pa

2023-03-24 Thread via GitHub


KnightChess commented on issue #8283:
URL: https://github.com/apache/hudi/issues/8283#issuecomment-1482748957

   @nsivabalan @yihua @XuQianJin-Stars @weimingdiit I think this need remind in 
doc or add check in 0.13.1, what about you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on issue #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to insert overwrite data, the table will be cleared first, and then the corresponding pa

2023-03-24 Thread via GitHub


KnightChess commented on issue #8283:
URL: https://github.com/apache/hudi/issues/8283#issuecomment-1482745990

   #7365 look like this pr change the dynamic action. Before it, hudi's 
overwrite is always dynamic, and I check the doc in 
`https://hudi.apache.org/releases/release-0.13.0` didn't remind it. It will 
cause serious data problems if upgrade to 0.13.0, user will delete all data by 
mistake. May be hudi need use some config to make user know this action or 
limit cover the whole table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5979) Replace individual hudi modules by hudi-trino-bundle in Trino Hudi connector

2023-03-24 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-5979:
-

 Summary: Replace individual hudi modules by hudi-trino-bundle in 
Trino Hudi connector
 Key: HUDI-5979
 URL: https://issues.apache.org/jira/browse/HUDI-5979
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Sagar Sumit
Assignee: Sagar Sumit
 Fix For: 0.13.1


Follow up to https://issues.apache.org/jira/browse/HUDI-3097



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482721544

   
   ## CI report:
   
   * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893)
 
   * 023778950aac15d0a5bcd57f5da2a5d7ffa2971f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8280:
URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482712698

   
   ## CI report:
   
   * 54fd6e37af699de1add3d92b57b6a3437623feb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15893)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on issue #8261: [SUPPORT] How to reduce hoodie commit latency

2023-03-24 Thread via GitHub


kazdy commented on issue #8261:
URL: https://github.com/apache/hudi/issues/8261#issuecomment-1482653969

   Aws emr team provided me with patched hudi 0.12.1 jar, you can ask aws 
support for it and instructions how to provide it to the cluster


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482647182

   
   ## CI report:
   
   * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15897)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8227: [HUDI-5952] Fix NPE when use kafka callback

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8227:
URL: https://github.com/apache/hudi/pull/8227#issuecomment-1482646921

   
   ## CI report:
   
   * 4cc7ab6ab87a640bcb68c97c55f642fde9ed5ecc UNKNOWN
   * 36abd1831338c963296e82e37d502db0deb5cc3b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15891)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8284:
URL: https://github.com/apache/hudi/pull/8284#issuecomment-1482638115

   
   ## CI report:
   
   * 1b032ff4bd9e40fba4bf2bb318a1acaa3f7d0d87 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-03-24 Thread via GitHub


hudi-bot commented on PR #8231:
URL: https://github.com/apache/hudi/pull/8231#issuecomment-1482626905

   
   ## CI report:
   
   * f59475005e6bfd827761e39f44cfca547654f1ff Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15889)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5978:
-
Labels: pull-request-available  (was: )

> spark timeline timezone is not updated when hoodie.table.timeline.timezone is 
> UTC
> -
>
> Key: HUDI-5978
> URL: https://issues.apache.org/jira/browse/HUDI-5978
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: inki hwang
>Priority: Minor
>  Labels: pull-request-available
>
> The commit timezone is not updated when HoodieSparkSqlWriter write method.
> For example, the LOCAL time zone is KST (UTC+9), and even if 
> 'hoodie.table.timeline.timezone' is UTC, the first instance time is created 
> as LOCAL (KST) and then initTable is called.
> Then, the second instant time after initTable is created in UTC and waits 
> because the first instant time is 9 hours ahead of the second KST.
> And in other situations, a write method started when there is already an 
> initialized table does not call setCommitTimezone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nicholas-fwang opened a new pull request, #8284: [HUDI-5978] spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread via GitHub


nicholas-fwang opened a new pull request, #8284:
URL: https://github.com/apache/hudi/pull/8284

   ### Change Logs
   
   Create instant time after setCommitTimezone if table exists, or after 
initTable if no table exists.
   
   ### Impact
   
   When hoodie.table.timeline.timezone is UTC in not UTC LOCAL timezone, 
timeline action does not progress.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   The commit timezone is not updated when HoodieSparkSqlWriter write method.
   
   For example, the LOCAL time zone is KST (UTC+9), and even if 
'hoodie.table.timeline.timezone' is UTC, the first instance time is created as 
LOCAL (KST) and then initTable is called.
   
   Then, the second instant time after initTable is created in UTC and waits 
because the first instant time is 9 hours ahead of the second KST.
   
   And in other situations, a write method started when there is already an 
initialized table does not call setCommitTimezone.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is UTC

2023-03-24 Thread inki hwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

inki hwang updated HUDI-5978:
-
Summary: spark timeline timezone is not updated when 
hoodie.table.timeline.timezone is UTC  (was: spark timeline timezone is not 
updated when hoodie.table.timeline.timezone is not UTC)

> spark timeline timezone is not updated when hoodie.table.timeline.timezone is 
> UTC
> -
>
> Key: HUDI-5978
> URL: https://issues.apache.org/jira/browse/HUDI-5978
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: inki hwang
>Priority: Minor
>
> The commit timezone is not updated when HoodieSparkSqlWriter write method.
> For example, the LOCAL time zone is KST (UTC+9), and even if 
> 'hoodie.table.timeline.timezone' is UTC, the first instance time is created 
> as LOCAL (KST) and then initTable is called.
> Then, the second instant time after initTable is created in UTC and waits 
> because the first instant time is 9 hours ahead of the second KST.
> And in other situations, a write method started when there is already an 
> initialized table does not call setCommitTimezone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5978) spark timeline timezone is not updated when hoodie.table.timeline.timezone is not UTC

2023-03-24 Thread inki hwang (Jira)
inki hwang created HUDI-5978:


 Summary: spark timeline timezone is not updated when 
hoodie.table.timeline.timezone is not UTC
 Key: HUDI-5978
 URL: https://issues.apache.org/jira/browse/HUDI-5978
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark
Reporter: inki hwang


The commit timezone is not updated when HoodieSparkSqlWriter write method.

For example, the LOCAL time zone is KST (UTC+9), and even if 
'hoodie.table.timeline.timezone' is UTC, the first instance time is created as 
LOCAL (KST) and then initTable is called.

Then, the second instant time after initTable is created in UTC and waits 
because the first instant time is 9 hours ahead of the second KST.

And in other situations, a write method started when there is already an 
initialized table does not call setCommitTimezone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] weimingdiit opened a new issue, #8283: [SUPPORT] In version 0.13.0, when using dynamic partition to write data, the table will be cleared first, and then the corresponding partition da

2023-03-24 Thread via GitHub


weimingdiit opened a new issue, #8283:
URL: https://github.com/apache/hudi/issues/8283

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] haripriyarhp commented on issue #8153: [SUPPORT] Async Clustering failing for MoR in 0.13.0

2023-03-24 Thread via GitHub


haripriyarhp commented on issue #8153:
URL: https://github.com/apache/hudi/issues/8153#issuecomment-1482577064

   @nsivabalan : Yes, async compaction is happening without any failures 
(though there is performance issues). But async clustering is not working. I 
even tried today by creating a new table and it throws the same error as above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >