Re: [PR] [HUDI-6898] Medatawriter closing in tests, update logging [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9768:
URL: https://github.com/apache/hudi/pull/9768#discussion_r1369690946


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -3545,6 +3546,7 @@ private List getAllFiles(HoodieTableMetadata 
metadata) throws Exception {
 return allfiles;
   }
 
+  // TODO

Review Comment:
   nit: should be removed?



##
pom.xml:
##
@@ -115,7 +115,7 @@
 2.17.2
 1.7.36
 2.9.9
-2.10.1
+2.10.2

Review Comment:
   Avoid version upgrade in this PR?



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergeOnReadSnapshotReader.java:
##
@@ -137,7 +137,8 @@ public HoodieMergeOnReadSnapshotReader(String tableBasePath,
 }
   }
 }
-LOG.debug("Time taken to merge base file and log file records: {}", 
timer.endTimer());
+long executionTime = timer.endTimer();
+LOG.debug("Time taken to merge base file and log file records: {}", 
executionTime);

Review Comment:
   nit: no need to change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6932] Updated batch size for delete partitions for Glue sync tool [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9842:
URL: https://github.com/apache/hudi/pull/9842#issuecomment-1776625853

   
   ## CI report:
   
   * 10d1cad3a2625c7276c6d8d04c4c258f732e9af8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20270)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6932] Updated batch size for delete partitions for Glue sync tool [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9842:
URL: https://github.com/apache/hudi/pull/9842#issuecomment-1776617382

   
   ## CI report:
   
   * 10d1cad3a2625c7276c6d8d04c4c258f732e9af8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6896] HoodieAvroHFileReader.RecordIterator iteration never terminates [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9789:
URL: https://github.com/apache/hudi/pull/9789#discussion_r1369683789


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java:
##
@@ -684,6 +685,10 @@ private static class RecordIterator implements 
ClosableIterator {
 public boolean hasNext() {
   try {
 // NOTE: This is required for idempotency
+if (eof) {
+  return false;
+}

Review Comment:
   Under what condition does the infinite iteration happen?  How to reproduce 
it in a test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6959] Bulk insert V2 do not rollback failed instant on abort [hudi]

2023-10-23 Thread via GitHub


stream2000 commented on PR #9887:
URL: https://github.com/apache/hudi/pull/9887#issuecomment-1776608491

   > so in such case, files are always created already?
   
   @boneanxs We are still checking the source code of Spark to confirm the 
mechanism.  However, in my local test, we did find out that new files were 
written after the rollback was scheduled. You can add a breakpoint at the 
`abort` method and run the test to reproduce it locally. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6959] Bulk insert V2 do not rollback failed instant on abort [hudi]

2023-10-23 Thread via GitHub


stream2000 commented on PR #9887:
URL: https://github.com/apache/hudi/pull/9887#issuecomment-1776604419

   > in the test I don't see explicit failure injection. How is the abort 
called and is it deterministically triggered in the test? 
   
   ```java
 // We can only upsert to existing consistent hashing bucket index 
table
 checkExceptionContain(insertStatement)("Consistent Hashing 
bulk_insert only support write to new file group")
   ```
   
   @yihua  We don't allow bulk insert into consistent hashing index table that 
already have parquet files because bulk insert v2  only support write parquet 
now. So bulk insert into the table will cause a  exception and it is 
deterministically. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6959] Bulk insert V2 do not rollback failed instant on abort [hudi]

2023-10-23 Thread via GitHub


yihua commented on PR #9887:
URL: https://github.com/apache/hudi/pull/9887#issuecomment-1776599636

   @stream2000 in the test I don't see explicit failure injection.  How is the 
`abort` called and is it deterministically triggered in the test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6482] Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy [hudi]

2023-10-23 Thread via GitHub


yihua commented on PR #9126:
URL: https://github.com/apache/hudi/pull/9126#issuecomment-1776582776

   @ksmou could you try reopen the PR on your side?  I'm not able to reopen it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6960] Support read partition values from path when schema evolution enabled [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9889:
URL: https://github.com/apache/hudi/pull/9889#discussion_r1369661651


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##
@@ -149,27 +152,10 @@ case class BaseFileOnlyRelation(override val sqlContext: 
SQLContext,
 val enableFileIndex = HoodieSparkConfUtils.getConfigValue(optParams, 
sparkSession.sessionState.conf,
   ENABLE_HOODIE_FILE_INDEX.key, 
ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean
 if (enableFileIndex && globPaths.isEmpty) {
-  // NOTE: There are currently 2 ways partition values could be fetched:
-  //  - Source columns (producing the values used for physical 
partitioning) will be read
-  //  from the data file
-  //  - Values parsed from the actual partition path would be 
appended to the final dataset
-  //
-  //In the former case, we don't need to provide the 
partition-schema to the relation,
-  //therefore we simply stub it w/ empty schema and use full 
table-schema as the one being
-  //read from the data file.

Review Comment:
   @wecharyu `shouldExtractPartitionValuesFromPartitionPath` can still return 
`false` based on super class?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestGetPartitionValuesFromPath.scala:
##
@@ -90,4 +90,37 @@ class TestGetPartitionValuesFromPath extends 
HoodieSparkSqlTestBase {
   }
 }
   }
+
+  test("Test get partition values from path when schema evolution applied") {
+withTable(generateTableName) { tableName =>
+  spark.sql(
+s"""
+   |create table $tableName (
+   | id int,
+   | name string,
+   | ts bigint,
+   | region string,
+   | dt date
+   |) using hudi
+   |tblproperties (
+   | primaryKey = 'id',
+   | type = 'cow',
+   | preCombineField = 'ts',
+   | hoodie.datasource.write.drop.partition.columns = 'true'
+   |)
+   |partitioned by (region, dt)""".stripMargin)
+
+  spark.sql(s"insert into $tableName partition (region='reg1', 
dt='2023-10-01') select 1, 'name1', 1000")
+  checkAnswer(s"select id, name, ts, region, cast(dt as string) from 
$tableName")(
+Seq(1, "name1", 1000, "reg1", "2023-10-01")
+  )

Review Comment:
   When writing the table, `hoodie.schema.on.read.enable=true` should also be 
set to enable schema evolution on read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9895:
URL: https://github.com/apache/hudi/pull/9895#discussion_r1369644417


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestIndexSyntax.scala:
##
@@ -28,59 +29,61 @@ import 
org.apache.spark.sql.hudi.command.{CreateIndexCommand, DropIndexCommand,
 class TestIndexSyntax extends HoodieSparkSqlTestBase {
 
   test("Test Create/Drop/Show/Refresh Index") {
-withTempDir { tmp =>
-  Seq("cow", "mor").foreach { tableType =>
-val databaseName = "default"
-val tableName = generateTableName
-val basePath = s"${tmp.getCanonicalPath}/$tableName"
-spark.sql(
-  s"""
- |create table $tableName (
- |  id int,
- |  name string,
- |  price double,
- |  ts long
- |) using hudi
- | options (
- |  primaryKey ='id',
- |  type = '$tableType',
- |  preCombineField = 'ts'
- | )
- | partitioned by(ts)
- | location '$basePath'
+if (HoodieSparkUtils.gteqSpark3_2) {

Review Comment:
   Looks like `TestSecondaryIndex` should also have a precondition on the spark 
version. 



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala:
##
@@ -32,23 +31,21 @@ import 
org.apache.spark.sql.hudi.HoodieSqlCommonUtils.getTableLocation
 import org.apache.spark.sql.{Row, SparkSession}
 
 import java.util
-
 import scala.collection.JavaConverters.{collectionAsScalaIterableConverter, 
mapAsJavaMapConverter}
 
 case class CreateIndexCommand(table: CatalogTable,
   indexName: String,
   indexType: String,
   ignoreIfExists: Boolean,
-  columns: Seq[(Attribute, Map[String, String])],
-  options: Map[String, String],
-  override val output: Seq[Attribute]) extends 
IndexBaseCommand {
+  columns: Seq[(Seq[String], Map[String, String])],
+  options: Map[String, String]) extends 
IndexBaseCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val tableId = table.identifier
 val metaClient = createHoodieTableMetaClient(tableId, sparkSession)
 val columnsMap: java.util.LinkedHashMap[String, java.util.Map[String, 
String]] =
   new util.LinkedHashMap[String, java.util.Map[String, String]]()
-columns.map(c => columnsMap.put(c._1.name, c._2.asJava))
+columns.map(c => columnsMap.put(c._1.mkString("."), c._2.asJava))

Review Comment:
   Why change this?  for nested fields?



##
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala:
##
@@ -3327,6 +3327,145 @@ class HoodieSpark3_3ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
   position = Option(ctx.colPosition).map(pos =>
 UnresolvedFieldPosition(typedVisit[ColumnPosition](pos
   }
+
+  /**

Review Comment:
   I assume the SQL parsing of INDEX SQL statement should not be different 
across Spark versions.



##
hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_2ExtendedSqlAstBuilder.scala:
##
@@ -3317,6 +3317,145 @@ class HoodieSpark3_2ExtendedSqlAstBuilder(conf: 
SQLConf, delegate: ParserInterfa
   position = Option(ctx.colPosition).map(pos =>
 UnresolvedFieldPosition(typedVisit[ColumnPosition](pos
   }
+
+   /**

Review Comment:
   Got it.  So at least CreateIndex is still supported in Spark 3.2.



##
hudi-spark-datasource/hudi-spark/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlCommon.g4:
##
@@ -135,51 +120,13 @@
  nonReserved
  : CALL
  | COMPACTION
- | CREATE
- | DROP
- | EXISTS
- | FROM
- | IN
- | INDEX
- | INDEXES
- | IF

Review Comment:
   Do we still need some of these tokens for other SQL statements?



##
hudi-spark-datasource/hudi-spark3.3.x/src/main/antlr4/org/apache/hudi/spark/sql/parser/HoodieSqlBase.g4:
##
@@ -29,5 +29,12 @@ statement
 | createTableHeader ('(' colTypeList ')')? tableProvider?
 createTableClauses
 (AS? query)?   
#createTable
+| CREATE INDEX (IF NOT EXISTS)? identifier ON TABLE?

Review Comment:
   Could we still maintain the grammar in a single place for all Spark 
versions, but fail the logical plan of INDEX SQL statement in Spark 3.1 and 
below, so the grammar can be easily maintained?



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/parser/HoodieSqlCommonAstBuilder.scala:
##
@@ -149,144 +149,4 @@ class HoodieSqlCommonAstBuilder(session: SparkSession, 
delegate: ParserI

Re: [PR] [HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 [hudi]

2023-10-23 Thread via GitHub


yihua commented on PR #9895:
URL: https://github.com/apache/hudi/pull/9895#issuecomment-1776548151

   cc @codope 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1776508692

   @arunvasudevan Are you there on hudi slack? If yes, can you message me there 
to have a call to understand the issue more. Thanks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] merge into hudi table with ArrayIndexOutOfBoundsException error [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9865:
URL: https://github.com/apache/hudi/issues/9865#issuecomment-1776506908

   Can you give more details @zyclove  on what MERGE INTO you are trying. Also 
your table configuration. I can make sure if what you are facing is something 
known or not. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776503543

   Yeah The version gets automatically upgrades when you write using new 
version. 0.14.0 uses table version 6. So the behaviour is expected. Not sure 
why it failed though. I will also create a table using 0.12.3 and try to 
upgrade and see i get any issues. 
   
   Do you use slack? If yes, you can join hudi community slack and we can sync 
up there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776497904

   Is there a WeChat group or other communication group where we can 
communicate with each other? The community group I joined before felt very 
inactive, and no one discussed the issues.
   @ad1happy2go 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1776496036

   @ad1happy2go This issue is the same as 
https://github.com/apache/hudi/issues/9016 .
   This problem is caused by the upgrade to version 0.14. After the upgrade, 
this problem suddenly occurred after running for a few days.
   
   After working on it all morning yesterday, there was really nothing I could 
do, so I cleaned up the historical data and ran it again, and it became normal 
afterwards.
   
   Is it still caused by version compatibility issues?
   
   It was 0.12.3 before. After directly upgrading the 0.14 bundle package, I 
found that the version of the hoodie.properties file in the data table changed 
from 5 to 6. Does this mean that the version has been upgraded normally? There 
is no manual upgrade table operation through commands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG]hudi cli command with Wrong FS error [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9903:
URL: https://github.com/apache/hudi/issues/9903#issuecomment-1776479980

   @ad1happy2go connect --path s:// is ok.
   compactions show all works well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6962] Fix the conflicts resolution for bulk insert under NB-CC [hudi]

2023-10-23 Thread via GitHub


beyond1920 commented on code in PR #9896:
URL: https://github.com/apache/hudi/pull/9896#discussion_r1369499290


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -2616,6 +2617,22 @@ public Integer getWritesFileIdEncoding() {
 return props.getInteger(WRITES_FILEID_ENCODING, 
HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID);
   }
 
+  public boolean needResolveWriteConflict(Option 
operationType) {
+if (getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+  // Skip to resolve conflict for non bulk_insert operation if using 
non-blocking concurrency control
+  // TODO: skip resolve conflict if the option is empty or the inner 
operation type is UNKNOWN ?
+  return !isNonBlockingConcurrencyControl() || 
mayBeBulkInsert(operationType);

Review Comment:
   Just to ensure the following two cases:
   1. We wanna skip resolve conflict if the optionType is `Option.empty` or the 
inner operation type is `UNKNOWN`?
   2. If the operationType is `Option.empty`, `operationType.get()` would throw 
`NoSuchElementException`. That's what we what? Or using 
`BULK_INSERT.equals(operationType.orElse(null))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6962] Fix the conflicts resolution for bulk insert under NB-CC [hudi]

2023-10-23 Thread via GitHub


beyond1920 commented on code in PR #9896:
URL: https://github.com/apache/hudi/pull/9896#discussion_r1369499290


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -2616,6 +2617,22 @@ public Integer getWritesFileIdEncoding() {
 return props.getInteger(WRITES_FILEID_ENCODING, 
HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID);
   }
 
+  public boolean needResolveWriteConflict(Option 
operationType) {
+if (getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+  // Skip to resolve conflict for non bulk_insert operation if using 
non-blocking concurrency control
+  // TODO: skip resolve conflict if the option is empty or the inner 
operation type is UNKNOWN ?
+  return !isNonBlockingConcurrencyControl() || 
mayBeBulkInsert(operationType);

Review Comment:
   Just to ensure the following two cases:
   1. We wanna skip resolve conflict if the optionType is `Option.empty` or the 
inner operation type is UNKNOWN?
   2. If the operationType is `Option.empty`, `operationType.get()` would throw 
`NoSuchElementException`. That's what we what? Or using 
`BULK_INSERT.equals(operationType.orElse(null))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6962] Fix the conflicts resolution for bulk insert under NB-CC [hudi]

2023-10-23 Thread via GitHub


beyond1920 commented on code in PR #9896:
URL: https://github.com/apache/hudi/pull/9896#discussion_r1369499290


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -2616,6 +2617,22 @@ public Integer getWritesFileIdEncoding() {
 return props.getInteger(WRITES_FILEID_ENCODING, 
HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID);
   }
 
+  public boolean needResolveWriteConflict(Option 
operationType) {
+if (getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+  // Skip to resolve conflict for non bulk_insert operation if using 
non-blocking concurrency control
+  // TODO: skip resolve conflict if the option is empty or the inner 
operation type is UNKNOWN ?
+  return !isNonBlockingConcurrencyControl() || 
mayBeBulkInsert(operationType);

Review Comment:
   Just to ensure the following two cases:
   1. We wanna skip resolve conflict if the option is empty or the inner 
operation type is UNKNOWN?
   2. If the operationType is Option.empty, operationType.get() would throw 
`NoSuchElementException`. That's what we what? Or using 
`BULK_INSERT.equals(operationType.orElse(null))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-1776409405

   
   ## CI report:
   
   * 75e98fe81be61e02f30d41d798ea86b733a26e2a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20448)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6971] OOM caused by configuring read.start_commit as earliest in stream reading [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9906:
URL: https://github.com/apache/hudi/pull/9906#discussion_r1369461753


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java:
##
@@ -407,21 +408,23 @@ private Result getHollowInputSplits(
   }
 
   @Nullable
-  private InstantRange getInstantRange(String issuedInstant, String 
instantToIssue, boolean nullableBoundary) {
+  private InstantRange getInstantRange(String issuedInstant, String 
startInstant, String instantToIssue, boolean nullableBoundary) {
 if (issuedInstant != null) {
   // the streaming reader may record the last issued instant, if the 
issued instant is present,
   // the instant range should be: (issued instant, the latest instant].
   return 
InstantRange.builder().startInstant(issuedInstant).endInstant(instantToIssue)
   
.nullableBoundary(nullableBoundary).rangeType(InstantRange.RangeType.OPEN_CLOSE).build();
-} else if 
(this.conf.getOptional(FlinkOptions.READ_START_COMMIT).isPresent()) {
-  // first time consume and has a start commit
+} else if 
(this.conf.getOptional(FlinkOptions.READ_START_COMMIT).isPresent()
+&& 
!this.conf.getString(FlinkOptions.READ_START_COMMIT).equalsIgnoreCase(FlinkOptions.START_COMMIT_LATEST))
 {
+  // first time consume , consumes form earliest commit or consumes from a 
start commit.
   final String startCommit = 
this.conf.getString(FlinkOptions.READ_START_COMMIT);
   return startCommit.equalsIgnoreCase(FlinkOptions.START_COMMIT_EARLIEST)
-  ? null
+  ? 
InstantRange.builder().startInstant(startInstant).endInstant(instantToIssue)

Review Comment:
   Reading from the latest commit is the default behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-1776362033

   
   ## CI report:
   
   * 74dab9f4a045822aef5565ff24cb8bbf15ef0f65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20410)
 
   * 75e98fe81be61e02f30d41d798ea86b733a26e2a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6962] Fix the conflicts resolution for bulk insert under NB-CC [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9896:
URL: https://github.com/apache/hudi/pull/9896#discussion_r1369460543


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -2616,6 +2617,22 @@ public Integer getWritesFileIdEncoding() {
 return props.getInteger(WRITES_FILEID_ENCODING, 
HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID);
   }
 
+  public boolean needResolveWriteConflict(Option 
operationType) {
+if (getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+  // Skip to resolve conflict for non bulk_insert operation if using 
non-blocking concurrency control
+  // TODO: skip resolve conflict if the option is empty or the inner 
operation type is UNKNOWN ?
+  return !isNonBlockingConcurrencyControl() || 
mayBeBulkInsert(operationType);

Review Comment:
   Remove the Option from the param and change the logic to:
   
   ```java
   return  BULK_INSERT.equals(operationType.get()) || 
!isNonBlockingConcurrencyControl()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6482] Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy [hudi]

2023-10-23 Thread via GitHub


ksmou commented on PR #9126:
URL: https://github.com/apache/hudi/pull/9126#issuecomment-1776358774

   @yihua plz reopen this, I delete it by mistake. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6798:

Status: Patch Available  (was: In Progress)

> Implement event-time-based merging mode in FileGroupReader
> --
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369448954


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -195,6 +206,10 @@ public HoodieKey getKey() {
 return key;
   }
 
+  public boolean isPartial() {
+return isPartial;

Review Comment:
   -1, does not make sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369448429


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -126,12 +128,13 @@ protected Option doProcessNextDataRecord(T record,
   // Merge and store the combined record
   // Note that the incoming `record` is from an older commit, so it should 
be put as
   // the `older` in the merge API
+
   HoodieRecord combinedRecord = (HoodieRecord) recordMerger.merge(
-  readerContext.constructHoodieRecord(Option.of(record), metadata, 
readerSchema),
-  readerSchema,
+  readerContext.constructHoodieRecord(Option.of(record), metadata),
+  (Schema) metadata.get(INTERNAL_META_SCHEMA),
   readerContext.constructHoodieRecord(
-  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight(), readerSchema),
-  readerSchema,
+  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight()),
+  (Schema) 
existingRecordMetadataPair.getRight().get(INTERNAL_META_SCHEMA),
   payloadProps).get().getLeft();

Review Comment:
   But it is specific per-file at lest right? Then we can initialize it each 
time ther reader prepares to read a new file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369448429


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -126,12 +128,13 @@ protected Option doProcessNextDataRecord(T record,
   // Merge and store the combined record
   // Note that the incoming `record` is from an older commit, so it should 
be put as
   // the `older` in the merge API
+
   HoodieRecord combinedRecord = (HoodieRecord) recordMerger.merge(
-  readerContext.constructHoodieRecord(Option.of(record), metadata, 
readerSchema),
-  readerSchema,
+  readerContext.constructHoodieRecord(Option.of(record), metadata),
+  (Schema) metadata.get(INTERNAL_META_SCHEMA),
   readerContext.constructHoodieRecord(
-  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight(), readerSchema),
-  readerSchema,
+  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight()),
+  (Schema) 
existingRecordMetadataPair.getRight().get(INTERNAL_META_SCHEMA),
   payloadProps).get().getLeft();

Review Comment:
   But it is specific per-file at lest right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1369447936


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##
@@ -411,10 +414,14 @@ object ExpressionPayload {
 parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
   }
 
-  private def getWriterSchema(props: Properties): Schema = {
-
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
-  s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
-parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+  private def getWriterSchema(props: Properties, isPartialUpdate: Boolean): 
Schema = {
+if (isPartialUpdate) {
+  
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))

Review Comment:
   Generally we may have 3 modes for fields that not updated in partial update:
   1. keep it as it is;
   2. force update it as null; (which I think should never happen in real case);
   3. overwrite with default (if the detault is defined in the schema)
   
   I think 1 is the most natural handling, but in any case, there reader should 
always use it's own reader schema for merging, not the writer schema.
   
   Another question is when to evolve the table schema, does it happends before 
or after the commit succeed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6539] New LSM tree style archived timeline [hudi]

2023-10-23 Thread via GitHub


danny0405 commented on PR #9209:
URL: https://github.com/apache/hudi/pull/9209#issuecomment-1776308765

   > Hello, does the master branch now support lsm format merge? @danny0405
   
   No, only the archived timeline uses LSM layout for instants access.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6973] Instantiate HoodieFileGroupRecordBuffer inside new file group reader [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9910:
URL: https://github.com/apache/hudi/pull/9910#issuecomment-1776284518

   
   ## CI report:
   
   * f158692bc1611582566b3bbd76e49d07a290e802 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20447)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6973] Instantiate HoodieFileGroupRecordBuffer inside new file group reader [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9910:
URL: https://github.com/apache/hudi/pull/9910#issuecomment-1776271532

   
   ## CI report:
   
   * f158692bc1611582566b3bbd76e49d07a290e802 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20447)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6974) Cleanup config descriptions for consistent language and clarity

2023-10-23 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-6974:
---

 Summary: Cleanup config descriptions for consistent language and 
clarity
 Key: HUDI-6974
 URL: https://issues.apache.org/jira/browse/HUDI-6974
 Project: Apache Hudi
  Issue Type: Task
Reporter: Bhavani Sudha
Assignee: Bhavani Sudha






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha closed HUDI-6972.
---
Resolution: Fixed

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: docs, pull-request-available
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6973] Instantiate HoodieFileGroupRecordBuffer inside new file group reader [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9910:
URL: https://github.com/apache/hudi/pull/9910#issuecomment-1776231883

   
   ## CI report:
   
   * f158692bc1611582566b3bbd76e49d07a290e802 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

Reviewers: Danny Chen, Lin Liu

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6928) Support position based merging in HoodieFileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6928.
---
Resolution: Fixed

> Support position based merging in HoodieFileGroupReader
> ---
>
> Key: HUDI-6928
> URL: https://issues.apache.org/jira/browse/HUDI-6928
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

Status: Patch Available  (was: In Progress)

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

Status: In Progress  (was: Open)

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

Fix Version/s: 1.0.0

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6973:
-
Labels: pull-request-available  (was: )

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-6973] Instantiate HoodieFileGroupRecordBuffer inside new file group reader [hudi]

2023-10-23 Thread via GitHub


yihua opened a new pull request, #9910:
URL: https://github.com/apache/hudi/pull/9910

   ### Change Logs
   
   This PR refactors the new file group reader (`HoodieFileGroupReader`) to 
instantiate `HoodieFileGroupRecordBuffer` inside the file group reader's 
constrcutors, instead of being passed in from outside.
   
   ### Impact
   
   Simplifies the instantiation of the new file group reader.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

   Epic Link: HUDI-6243
Story Points: 2

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6973:

Priority: Blocker  (was: Major)

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6973:
---

 Summary: Instantiate HoodieFileGroupRecordBuffer inside new file 
group reader
 Key: HUDI-6973
 URL: https://issues.apache.org/jira/browse/HUDI-6973
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6973) Instantiate HoodieFileGroupRecordBuffer inside new file group reader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6973:
---

Assignee: Ethan Guo

> Instantiate HoodieFileGroupRecordBuffer inside new file group reader
> 
>
> Key: HUDI-6973
> URL: https://issues.apache.org/jira/browse/HUDI-6973
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] [SUPPORT] EMR 6.13.0 Hudi cleaning throws method not found for SIMS cache [hudi]

2023-10-23 Thread via GitHub


subash-metica commented on issue #9909:
URL: https://github.com/apache/hudi/issues/9909#issuecomment-1776116310

   Upon looking at the error, it is getting triggered for MOR table - which 
only metadata table is MOR since the base table is COW in my example. Looks 
like the error is not in cleaning but while performing compaction of metadata 
table which is MOR. Any leads on how to fix this issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6961] Fix deletes with custom delete field in DefaultHoodieRecordPayload [hudi]

2023-10-23 Thread via GitHub


yihua commented on PR #9892:
URL: https://github.com/apache/hudi/pull/9892#issuecomment-1776087014

   @danny0405 I also changed the payload creation logic for Flink.  Could you 
also review the relevant changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] EMR 6.13.0 Hudi cleaning throws method not found for SIMS cache [hudi]

2023-10-23 Thread via GitHub


subash-metica opened a new issue, #9909:
URL: https://github.com/apache/hudi/issues/9909

   Caused by: java.lang.IllegalStateException: 
com.github.benmanes.caffeine.cache.SIMS
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:90)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:40)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3947)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3943)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1051) 
~[__app__.jar:?]
at 
org.apache.hudi.common.util.InternalSchemaCache.(InternalSchemaCache.java:72)
 ~[hudi-utilities-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
... 79 more
   Caused by: java.lang.NoSuchMethodException: no such constructor: 
com.github.benmanes.caffeine.cache.SIMS.(Caffeine,AsyncCacheLoader,boolean)void/newInvokeSpecial
at java.lang.invoke.MemberName.makeAccessException(MemberName.java:974) 
~[?:?]
at 
java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1117) ~[?:?]
at 
java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3649) 
~[?:?]
at 
java.lang.invoke.MethodHandles$Lookup.findConstructor(MethodHandles.java:2750) 
~[?:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:85)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:40)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3947)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3943)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1051) 
~[__app__.jar:?]
at 
org.apache.hudi.common.util.InternalSchemaCache.(InternalSchemaCache.java:72)
 ~[hudi-utilities-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
... 79 more
   Caused by: java.lang.NoSuchMethodError: 
com.github.benmanes.caffeine.cache.SIMS: method 'void 
(com.github.benmanes.caffeine.cache.Caffeine, 
com.github.benmanes.caffeine.cache.AsyncCacheLoader, boolean)' not found
at java.lang.invoke.MethodHandleNatives.resolve(Native Method) ~[?:?]
at java.lang.invoke.MemberName$Factory.resolve(MemberName.java:1085) 
~[?:?]
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a COW Hudi table with 10 commits, and then perform delete. The 
cleaning kicks off but fails with error.
   
   **Expected behavior**
   
   Successful clean operation
   
   **Environment Description**
   
   * EMR Version: 6.13.0
   
   * Hudi version : 0.13.1-amz
   
   * Spark version : 3.3.2
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   Caused by: java.lang.IllegalStateException: 
com.github.benmanes.caffeine.cache.SIMS
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:90)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:40)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3947)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalManualCache.(BoundedLocalCache.java:3943)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.Caffeine.build(Caffeine.java:1051) 
~[__app__.jar:?]
at 
org.apache.hudi.common.util.InternalSchemaCache.(InternalSchemaCache.java:72)
 ~[hudi-utilities-bundle_2.12-0.13.1-amzn-1.jar:0.13.1-amzn-1]
... 79 more
   Caused by: java.lang.NoSuchMethodException: no such constructor: 
com.github.benmanes.caffeine.cache.SIMS.(Caffeine,AsyncCacheLoader,boolean)void/newInvokeSpecial
at java.lang.invoke.MemberName.makeAccessException(MemberName.java:974) 
~[?:?]
at 
java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1117) ~[?:?]
at 
java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3649) 
~[?:?]
at 
java.lang.invoke.MethodHandles$Lookup.findConstructor(MethodHandles.java:2750) 
~[?:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.loadFactory(LocalCacheFactory.java:85)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache.LocalCacheFactory.newBoundedLocalCache(LocalCacheFactory.java:40)
 ~[__app__.jar:?]
at 
com.github.benmanes.caffeine.cache

[jira] [Commented] (HUDI-6910) Handle schema evolution across base and log files in HoodieFileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778832#comment-17778832
 ] 

Ethan Guo commented on HUDI-6910:
-

Part of the changes in HUDI-6801 should fix this.

> Handle schema evolution across base and log files in HoodieFileGroupReader
> --
>
> Key: HUDI-6910
> URL: https://issues.apache.org/jira/browse/HUDI-6910
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Goal: When the schema evolves from base to log files, the new 
> HoodieFileGroupReader should handle the schema evolution within the file 
> group properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6910) Handle schema evolution across base and log files in HoodieFileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6910:

Status: Patch Available  (was: In Progress)

> Handle schema evolution across base and log files in HoodieFileGroupReader
> --
>
> Key: HUDI-6910
> URL: https://issues.apache.org/jira/browse/HUDI-6910
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Goal: When the schema evolves from base to log files, the new 
> HoodieFileGroupReader should handle the schema evolution within the file 
> group properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6801) Implement merging of partial updates in FileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6801:

Reviewers: Danny Chen

> Implement merging of partial updates in FileGroupReader
> ---
>
> Key: HUDI-6801
> URL: https://issues.apache.org/jira/browse/HUDI-6801
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6801) Implement merging of partial updates in FileGroupReader

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6801:

Status: Patch Available  (was: In Progress)

> Implement merging of partial updates in FileGroupReader
> ---
>
> Key: HUDI-6801
> URL: https://issues.apache.org/jira/browse/HUDI-6801
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6956) Fix CI failure on master

2023-10-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-6956.
---
Resolution: Fixed

> Fix CI failure on master
> 
>
> Key: HUDI-6956
> URL: https://issues.apache.org/jira/browse/HUDI-6956
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> CI failure in GH action running on Spark 2.4
> {code:java}
> 2023-10-18T08:25:11.0927081Z - Test multiple partition fields pruning *** 
> FAILED ***
> 2023-10-18T08:25:11.0928903Z Ā  
> org.apache.spark.sql.catalyst.parser.ParseException: extraneous input ';' 
> expecting (line 2, pos 53)
> 2023-10-18T08:25:11.0930214ZĀ 
> 2023-10-18T08:25:11.0930814Z == SQL ==
> 2023-10-18T08:25:11.0931092ZĀ 
> 2023-10-18T08:25:11.0931565Z select * from h171 where day='2023-10-12' and 
> hour=11;
> 2023-10-18T08:25:11.0932258Z 
> -^^^
> 2023-10-18T08:25:11.0933281Z Ā  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
> 2023-10-18T08:25:11.0934664Z Ā  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
> 2023-10-18T08:25:11.0935909Z Ā  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
> 2023-10-18T08:25:11.0937200Z Ā  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
> 2023-10-18T08:25:11.0938893Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:45)
> 2023-10-18T08:25:11.0940866Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:42)
> 2023-10-18T08:25:11.0942715Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parse(HoodieSpark2ExtendedSqlParser.scala:80)
> 2023-10-18T08:25:11.0944508Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parsePlan(HoodieSpark2ExtendedSqlParser.scala:42)
> 2023-10-18T08:25:11.0946437Z Ā  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:43)
> 2023-10-18T08:25:11.0948031Z Ā  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:40)
> 2023-10-18T08:25:11.0949087Z Ā  ...
> 2023-10-18T08:25:31.8632763Z - Test single partiton field pruning *** FAILED 
> ***
> 2023-10-18T08:25:31.8634653Z Ā  
> org.apache.spark.sql.catalyst.parser.ParseException: extraneous input ';' 
> expecting (line 2, pos 53)
> 2023-10-18T08:25:31.8635951ZĀ 
> 2023-10-18T08:25:31.8636595Z == SQL ==
> 2023-10-18T08:25:31.8636881ZĀ 
> 2023-10-18T08:25:31.8637365Z select * from h172 where day='2023-10-12' and 
> hour=11;
> 2023-10-18T08:25:31.8638064Z 
> -^^^
> 2023-10-18T08:25:31.8639056Z Ā  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
> 2023-10-18T08:25:31.8640426Z Ā  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
> 2023-10-18T08:25:31.8641945Z Ā  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
> 2023-10-18T08:25:31.8643243Z Ā  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
> 2023-10-18T08:25:31.8644939Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:45)
> 2023-10-18T08:25:31.8646914Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser$$anonfun$parsePlan$1.apply(HoodieSpark2ExtendedSqlParser.scala:42)
> 2023-10-18T08:25:31.8648770Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parse(HoodieSpark2ExtendedSqlParser.scala:80)
> 2023-10-18T08:25:31.8650554Z Ā  at 
> org.apache.spark.sql.hudi.parser.HoodieSpark2ExtendedSqlParser.parsePlan(HoodieSpark2ExtendedSqlParser.scala:42)
> 2023-10-18T08:25:31.8652258Z Ā  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:43)
> 2023-10-18T08:25:31.8653871Z Ā  at 
> org.apache.spark.sql.parser.HoodieCommonSqlParser$$anonfun$parsePlan$1.apply(HoodieCommonSqlParser.scala:40)
> 2023-10-18T08:25:31.8654880Z Ā  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6482] Supports new compaction strategy DayBasedAndBoundedIOCompactionStrategy [hudi]

2023-10-23 Thread via GitHub


yihua commented on PR #9126:
URL: https://github.com/apache/hudi/pull/9126#issuecomment-1775900919

   @ksmou do you still plan to revise this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha resolved HUDI-6972.
-

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: docs, pull-request-available
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch asf-site updated: [HUDI-6972][DOCS] Fix config link redirection (#9908)

2023-10-23 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 729dac981de [HUDI-6972][DOCS] Fix config link redirection (#9908)
729dac981de is described below

commit 729dac981deaca25e0c4fcce98eab18c0f6ac5d7
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Mon Oct 23 12:32:37 2023 -0700

[HUDI-6972][DOCS] Fix config link redirection (#9908)
---
 website/src/theme/DocPage/index.js | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/website/src/theme/DocPage/index.js 
b/website/src/theme/DocPage/index.js
index 552adcfa357..a8b5bf2ea36 100644
--- a/website/src/theme/DocPage/index.js
+++ b/website/src/theme/DocPage/index.js
@@ -4,7 +4,7 @@
  * This source code is licensed under the MIT license found in the
  * LICENSE file in the root directory of this source tree.
  */
-import React, {useState, useCallback} from 'react';
+import React, {useState, useCallback, useEffect} from 'react';
 import {MDXProvider} from '@mdx-js/react';
 import renderRoutes from '@docusaurus/renderRoutes';
 import Layout from '@theme/Layout';
@@ -44,6 +44,27 @@ function DocPageContent({
 
 setHiddenSidebarContainer((value) => !value);
   }, [hiddenSidebar]);
+  if(typeof window !== 'undefined') {
+  useEffect(() => {
+  const timeout = setTimeout(() => {
+const [_, hashValue] = window.location.href.split('#');
+
+const element = 
document.querySelectorAll(`[href="#${hashValue}"]`)?.[0];
+if(element) {
+  const headerOffset = 90;
+  const elementPosition = element.getBoundingClientRect().top;
+  const offsetPosition = elementPosition + window.pageYOffset - 
headerOffset;
+  window.scrollTo({
+top: offsetPosition
+  });
+}
+  }, 100);
+
+  return () => {
+clearTimeout(timeout);
+  }
+  }, [window.location.href]);
+  }
   return (
 

Re: [PR] [HUDI-6972][DOCS] Fix config link redirection [hudi]

2023-10-23 Thread via GitHub


bhasudha merged PR #9908:
URL: https://github.com/apache/hudi/pull/9908


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6967] Add clearJobStatus api in HoodieEngineContext [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9899:
URL: https://github.com/apache/hudi/pull/9899#discussion_r1369141606


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -215,6 +219,7 @@ protected List> 
loadColumnRangesFromMetaIndex(
 String keyField = 
hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();
 
 List> baseFilesForAllPartitions = 
HoodieIndexUtils.getLatestBaseFilesForAllPartitions(partitions, context, 
hoodieTable);
+context.clearJobStatus();

Review Comment:
   This shouldn't be added. Key range loading has not finished here.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##
@@ -758,7 +762,6 @@ protected void reconcileAgainstMarkers(HoodieEngineContext 
context,
 }
 
 // Now delete partially written files
-context.setJobStatus(this.getClass().getSimpleName(), "Delete all 
partially written files: " + config.getTableName());

Review Comment:
   Why deleting this?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseWriteHelper.java:
##
@@ -61,6 +61,7 @@ public HoodieWriteMetadata write(String instantTime,
 // perform index loop up to get existing location of records
 context.setJobStatus(this.getClass().getSimpleName(), "Tagging: " + 
table.getConfig().getTableName());
 taggedRecords = tag(dedupedRecords, context, table);
+context.clearJobStatus();

Review Comment:
   If lazy execution happens afterwards, the job status may not be properly 
populated.  Have you verified all places that this won't happen?



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java:
##
@@ -111,44 +111,48 @@ public BaseSparkCommitActionExecutor(HoodieEngineContext 
context,
 
   private HoodieData> 
clusteringHandleUpdate(HoodieData> inputRecords) {
 context.setJobStatus(this.getClass().getSimpleName(), "Handling updates 
which are under clustering: " + config.getTableName());
-Set fileGroupsInPendingClustering =
-
table.getFileSystemView().getFileGroupsInPendingClustering().map(Pair::getKey).collect(Collectors.toSet());
-// Skip processing if there is no inflight clustering
-if (fileGroupsInPendingClustering.isEmpty()) {
-  return inputRecords;
-}
+try {
+  Set fileGroupsInPendingClustering =
+  
table.getFileSystemView().getFileGroupsInPendingClustering().map(Pair::getKey).collect(Collectors.toSet());
+  // Skip processing if there is no inflight clustering
+  if (fileGroupsInPendingClustering.isEmpty()) {
+return inputRecords;
+  }
 
-UpdateStrategy>> updateStrategy = 
(UpdateStrategy>>) ReflectionUtils
-.loadClass(config.getClusteringUpdatesStrategyClass(), new Class[] 
{HoodieEngineContext.class, HoodieTable.class, Set.class},
-this.context, table, fileGroupsInPendingClustering);
-// For SparkAllowUpdateStrategy with rollback pending clustering as false, 
need not handle
-// the file group intersection between current ingestion and pending 
clustering file groups.
-// This will be handled at the conflict resolution strategy.
-if (updateStrategy instanceof SparkAllowUpdateStrategy && 
!config.isRollbackPendingClustering()) {
-  return inputRecords;
-}
-Pair>, Set> 
recordsAndPendingClusteringFileGroups =
-updateStrategy.handleUpdate(inputRecords);
+  UpdateStrategy>> updateStrategy = 
(UpdateStrategy>>) ReflectionUtils
+  .loadClass(config.getClusteringUpdatesStrategyClass(), new 
Class[] {HoodieEngineContext.class, HoodieTable.class, Set.class},
+  this.context, table, fileGroupsInPendingClustering);
+  // For SparkAllowUpdateStrategy with rollback pending clustering as 
false, need not handle
+  // the file group intersection between current ingestion and pending 
clustering file groups.
+  // This will be handled at the conflict resolution strategy.
+  if (updateStrategy instanceof SparkAllowUpdateStrategy && 
!config.isRollbackPendingClustering()) {
+return inputRecords;
+  }
+  Pair>, Set> 
recordsAndPendingClusteringFileGroups =
+  updateStrategy.handleUpdate(inputRecords);
 
-Set fileGroupsWithUpdatesAndPendingClustering = 
recordsAndPendingClusteringFileGroups.getRight();
-if (fileGroupsWithUpdatesAndPendingClustering.isEmpty()) {
+  Set fileGroupsWithUpdatesAndPendingClustering = 
recordsAndPendingClusteringFileGroups.getRight();
+  if (fileGroupsWithUpdatesAndPendingClustering.isEmpty()) {
+return recordsAndPendingClusteringFileGroups.getLeft();
+  }
+  // there are file groups pending clustering and receiving updates, so 
rollback the pending clustering instants
+  // there could be race condition, for example, if the clustering 
completes aft

Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369126255


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -126,12 +128,13 @@ protected Option doProcessNextDataRecord(T record,
   // Merge and store the combined record
   // Note that the incoming `record` is from an older commit, so it should 
be put as
   // the `older` in the merge API
+
   HoodieRecord combinedRecord = (HoodieRecord) recordMerger.merge(
-  readerContext.constructHoodieRecord(Option.of(record), metadata, 
readerSchema),
-  readerSchema,
+  readerContext.constructHoodieRecord(Option.of(record), metadata),
+  (Schema) metadata.get(INTERNAL_META_SCHEMA),
   readerContext.constructHoodieRecord(
-  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight(), readerSchema),
-  readerSchema,
+  existingRecordMetadataPair.getLeft(), 
existingRecordMetadataPair.getRight()),
+  (Schema) 
existingRecordMetadataPair.getRight().get(INTERNAL_META_SCHEMA),
   payloadProps).get().getLeft();

Review Comment:
   When there are more log files, partial updates, and schema evolution, 
`(Schema) metadata.get(INTERNAL_META_SCHEMA)` can be different across record 
keys.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369119288


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/BaseSparkInternalRowReaderContext.java:
##
@@ -94,17 +94,18 @@ public Comparable getOrderingValue(Option 
rowOption,
 
   @Override
   public HoodieRecord constructHoodieRecord(Option 
rowOption,
- Map 
metadataMap,
- Schema schema) {
+ Map 
metadataMap) {
 if (!rowOption.isPresent()) {
   return new HoodieEmptyRecord<>(
   new HoodieKey((String) metadataMap.get(INTERNAL_META_RECORD_KEY),
   (String) metadataMap.get(INTERNAL_META_PARTITION_PATH)),
   HoodieRecord.HoodieRecordType.SPARK);
 }
 
+Schema schema = (Schema) metadataMap.get(INTERNAL_META_SCHEMA);
 InternalRow row = rowOption.get();
-return new HoodieSparkRecord(row, 
HoodieInternalRowUtils.getCachedSchema(schema));
+boolean isPartial = (boolean) 
metadataMap.getOrDefault(INTERNAL_META_IS_PARTIAL, false);
+return new HoodieSparkRecord(row, 
HoodieInternalRowUtils.getCachedSchema(schema), isPartial);

Review Comment:
   Reason mentioned above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6801] Implement merging partial updates from log files for MOR tables [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9883:
URL: https://github.com/apache/hudi/pull/9883#discussion_r1369118091


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java:
##
@@ -195,6 +206,10 @@ public HoodieKey getKey() {
 return key;
   }
 
+  public boolean isPartial() {
+return isPartial;

Review Comment:
   `isPartial` is determined at the commit or write batch level, but for record 
merging to work in the current implementation and maintain the layering, it's 
better to have the flag at the record level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1369060242


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/payload/ExpressionPayload.scala:
##
@@ -411,10 +414,14 @@ object ExpressionPayload {
 parseSchema(props.getProperty(PAYLOAD_RECORD_AVRO_SCHEMA))
   }
 
-  private def getWriterSchema(props: Properties): Schema = {
-
ValidationUtils.checkArgument(props.containsKey(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key),
-  s"Missing ${HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key} property")
-parseSchema(props.getProperty(HoodieWriteConfig.WRITE_SCHEMA_OVERRIDE.key))
+  private def getWriterSchema(props: Properties, isPartialUpdate: Boolean): 
Schema = {
+if (isPartialUpdate) {
+  
parseSchema(props.getProperty(HoodieWriteConfig.WRITE_PARTIAL_UPDATE_SCHEMA.key))

Review Comment:
   In this PR, for updates in MOR tables, after processing the Spark SQL MERGE 
INTO statement, the writer gets the updates with partial schema and pass them 
to the `HoodieAppendHandle`.  Regardless, the original intent to include 
`FULL_SCHEMA` is for merging partial updates at the reader side.
   
   If we assume that values for a non-updated column should be either existing 
value (column in the existing schema) or null (new column in the evolved 
schema) in merging partial updates, the `FULL_SCHEMA` may not be stored in the 
log block header.  See the following examples:
   
   ```
   Example 1:
   base file: schema (col1, col2) (full schema at this instant: (col1, col2))
   log 1: partial, schema (col2, col3) (full schema at this instant: (col1, 
col2, col3))
   after log merging: schema (col1, col2, col3) 
   (col1 values from base file, col2, col3 values from log1 for overwrite with 
latest)
   
   Example 2:
   base file: schema (col1, col2) (full schema at this instant: (col1, col2))
   log 1: partial, schema (col2, col3) (full schema at this instant: (col1, 
col2, col3, col4))
   after log merging: schema (col1, col2, col3)
   project to full schema: (col1, col2, col3) -> (col1, col2, col3, col4), with 
nulls in col4
   (col1 values from base file, col2, col3 values from log1 for overwrite with 
latest, col4 has nulls)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6800] Support writing partial updates to the data blocks in MOR tables [hudi]

2023-10-23 Thread via GitHub


yihua commented on code in PR #9876:
URL: https://github.com/apache/hudi/pull/9876#discussion_r1369039030


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -652,6 +660,16 @@ private static Map 
getUpdatedHeader(Map

Re: [PR] [HUDI-6972][DOCS] Fix config link redirection [hudi]

2023-10-23 Thread via GitHub


bhasudha commented on PR #9908:
URL: https://github.com/apache/hudi/pull/9908#issuecomment-1775287986

   Tested locally 2 things:
   1. Within configs page clicking any config link renders it properly. Shown 
here after clicking.
   2. Tested redirection to specific configs from other pages. Cannot show the 
test here since it would need a video screen capture.
   
   Test for 1. described above.
   https://github.com/apache/hudi/assets/2179254/ab1d1fad-110a-4316-8452-5c125c80";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6972:
-
Labels: docs pull-request-available  (was: docs)

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: docs, pull-request-available
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-6972][DOCS] Fix config link redirection [hudi]

2023-10-23 Thread via GitHub


bhasudha opened a new pull request, #9908:
URL: https://github.com/apache/hudi/pull/9908

   ### Change Logs
   
   website fixes to ensure config links are working as expected.
   
   ### Impact
   
   website changes
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha updated HUDI-6972:

Status: In Progress  (was: Open)

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: docs
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-6972:
---

 Summary: Fix redirection to individual config links
 Key: HUDI-6972
 URL: https://issues.apache.org/jira/browse/HUDI-6972
 Project: Apache Hudi
  Issue Type: Task
Reporter: Bhavani Sudha


Currently, the links for configs are not working as expected. The top of the 
page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha updated HUDI-6972:

Priority: Minor  (was: Major)

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Priority: Minor
>  Labels: docs
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6972) Fix redirection to individual config links

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha reassigned HUDI-6972:
---

Assignee: Bhavani Sudha

> Fix redirection to individual config links
> --
>
> Key: HUDI-6972
> URL: https://issues.apache.org/jira/browse/HUDI-6972
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: docs
>
> Currently, the links for configs are not working as expected. The top of the 
> page is rendered instead of the actual config section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-6112) Improve Doc generatiion to generate config tables for basic and advanced configs

2023-10-23 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha resolved HUDI-6112.
-

> Improve Doc generatiion to generate config tables for basic and advanced 
> configs
> 
>
> Key: HUDI-6112
> URL: https://issues.apache.org/jira/browse/HUDI-6112
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.1
>
>
> The HoodieConfigDocGenerator will need to be modified such that:
>  * Each config group has two sections: basic configs and advanced configs
>  * Basic configs and Advanced configs are played out in a table instead of a 
> serially like today.
>  * Among each of these tables the required configs are bubbled up to the top 
> of the table and highlighted.
> Add UI fixes to support a table layout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6970] Stream read allows skipping archived commits [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9905:
URL: https://github.com/apache/hudi/pull/9905#issuecomment-1775266175

   
   ## CI report:
   
   * 31be10290de4f6bbc9ecd385202ee9c1d655eac2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20444)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1775266079

   
   ## CI report:
   
   * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6971] OOM caused by configuring read.start_commit as earliest in stream reading [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9906:
URL: https://github.com/apache/hudi/pull/9906#issuecomment-1775218968

   
   ## CI report:
   
   * 28cd284a93f70e853ae3d9373fd01df3aa5c12cf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20445)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6866]When invalidate the table in the spark sql query cache, verify if theā€¦ [hudi]

2023-10-23 Thread via GitHub


zhangyue19921010 merged PR #9425:
URL: https://github.com/apache/hudi/pull/9425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (bb8fc3e9f63 -> fe010bb1855)

2023-10-23 Thread zhangyue19921010
This is an automated email from the ASF dual-hosted git repository.

zhangyue19921010 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from bb8fc3e9f63 [HUDI-6929] Lazy loading dynamically for 
CompletionTimeQueryView (#9898)
 add fe010bb1855 When invalidate the table in the spark sql query cache, 
verify if the hive-async database exists (#9425)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)



Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9907:
URL: https://github.com/apache/hudi/issues/9907#issuecomment-1775119269

   @brightwon Interesting. Thanks for raising this. Looks like a regression. 
Can you provide full stack trace.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

2023-10-23 Thread via GitHub


brightwon commented on issue #9907:
URL: https://github.com/apache/hudi/issues/9907#issuecomment-1775080933

   Now, I downgraded my Hudi version to 0.13.1 and the error no longer occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] AWS Athena query fail when compaction is scheduled for MOR table [hudi]

2023-10-23 Thread via GitHub


brightwon opened a new issue, #9907:
URL: https://github.com/apache/hudi/issues/9907

   I'm using hudi 0.14.0 with flink 1.16.1 to store data from kafka to s3.
   but Athena(Engine 3) query to MOR table is not working because of this error.
   
   ```
   Error running query: HIVE_UNKNOWN_ERROR: 
io.trino.plugin.hive.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: ***; S3 Extended Request ID: ***; Proxy: null), S3 Extended Request ID: *** 
(Bucket: mybucket, Key: 
mytable/.hoodie/.aux/20231014095517882.compaction.requested)
   ```
   
   This error occurs if compaction is scheduled.
   After compaction is complete, query is working.
   
   Here's flink hudi option (Java)
   ```
   flinkHudiOptions.put(FlinkOptions.PATH.key(), basePath);
   flinkHudiOptions.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.MERGE_ON_READ.name());
   flinkHudiOptions.put(FlinkOptions.OPERATION.key(), 
WriteOperationType.UPSERT.name());
   flinkHudiOptions.put(FlinkOptions.PRECOMBINE_FIELD.key(), "event_time");
   flinkHudiOptions.put(FlinkOptions.KEYGEN_CLASS_NAME.key(), 
"org.apache.hudi.keygen.ComplexKeyGenerator");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_ASYNC_ENABLED.key(), "true");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_TRIGGER_STRATEGY.key(), 
FlinkOptions.NUM_COMMITS);
   flinkHudiOptions.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), "5");
   flinkHudiOptions.put(FlinkOptions.COMPACTION_MAX_MEMORY.key(), "1024");
   flinkHudiOptions.put(FlinkOptions.METADATA_ENABLED.key(), "true");
   flinkHudiOptions.put(HoodieMetadataConfig.ASYNC_INDEX_ENABLE.key(), "true");
   
flinkHudiOptions.put(HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key(),
 "true");
   flinkHudiOptions.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
   flinkHudiOptions.put(HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key(), 
"org.apache.hudi.client.transaction.lock.InProcessLockProvider");
   flinkHudiOptions.put(FlinkOptions.CLEAN_ASYNC_ENABLED.key(), "true");
   flinkHudiOptions.put(FlinkOptions.CLEAN_POLICY.key(), 
HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS.name());
   flinkHudiOptions.put(FlinkOptions.CLEAN_RETAIN_HOURS.key(), "24");
   ```
   
   My flink application works on flink-operator's FlinkDeployment (on AWS EKS).
   I ran the hive-sync command once in EMR 6.10.0 (Hudi 0.12.2-amzn-0 version) 
for easy use of Glue MetaStore.
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. run flink application with above options
   2. run hive-sync once for making  using hive-sync on EMR
   3. run athena query when compaction is scheduled
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Flink version : 1.16.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HoodieCompaction with schema parse NullPointerException [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9902:
URL: https://github.com/apache/hudi/issues/9902#issuecomment-1775072273

   @zyclove Thanks for raising this. Looks like compaction is throwing out this 
Exception with those schema configuration. I will try to triage this. Can you 
help us with some sample data or sample script which can help us to reproduce 
this issue.
   
   I tried to reproduce using below code and see compaction happening fine - 
   ```
   SET hoodie.schema.on.read.enable=true;
   SET hoodie.datasource.write.reconcile.schema=true;
   SET hoodie.avro.schema.validate=true;
   SET hoodie.datasource.write.new.columns.nullable=true;
   
   CREATE TABLE hudi_table (
   ts BIGINT,
   uuid STRING,
   rider STRING,
   driver STRING,
   fare DECIMAL(10,4),
   city STRING
   ) USING HUDI
   tblproperties (
   type = 'mor', primaryKey = 'uuid', preCombineField = 'ts'
   ,hoodie.datasource.write.new.columns.nullable = 'true'
   ,hoodie.avro.schema.validate = 'true'
   ,hoodie.schema.on.read.enable = 'true'
   ,hoodie.datasource.write.reconcile.schema = 'true'
   )
   PARTITIONED BY (city);
   
   -- Tried multiple insert commands with multiple values and confirmed 
compaction is happening fine.
   INSERT INTO hudi_table
   VALUES
   
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',11.0001,'san_francisco'),
   
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',11.0001
 ,'san_francisco');
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG]hudi cli command with Wrong FS error [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9903:
URL: https://github.com/apache/hudi/issues/9903#issuecomment-1775045166

   @zyclove Are you able to run other cli commands fine, just to check if S3 
connection is fine from cli
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] ERROR BaseSparkCommitActionExecutor: Error upserting bucketType UPDATE for partition :13 [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9119:
URL: https://github.com/apache/hudi/issues/9119#issuecomment-1775028059

   @danny0405 This problem still exists in version 014 too, how to solve it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]spark-sql MOR query error with org.apache.avro.SchemaParseException: Cannot parse schema [hudi]

2023-10-23 Thread via GitHub


zyclove commented on issue #9016:
URL: https://github.com/apache/hudi/issues/9016#issuecomment-1775022886

   @ad1happy2go This problem still exists in version 014, how to solve it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6971] OOM caused by configuring read.start_commit as earliest in stream reading [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9906:
URL: https://github.com/apache/hudi/pull/9906#issuecomment-1775007181

   
   ## CI report:
   
   * 28cd284a93f70e853ae3d9373fd01df3aa5c12cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20445)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6971] OOM caused by configuring read.start_commit as earliest in stream reading [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9906:
URL: https://github.com/apache/hudi/pull/9906#issuecomment-1774940343

   
   ## CI report:
   
   * 28cd284a93f70e853ae3d9373fd01df3aa5c12cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6962] Fix the conflicts resolution for bulk insert under NB-CC [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9896:
URL: https://github.com/apache/hudi/pull/9896#issuecomment-1774940114

   
   ## CI report:
   
   * 9ab01f405b75097cb3d1c610d7e47c0eed92b10d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20442)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1774940196

   
   ## CI report:
   
   * 23af1b3753a523ffd717b7fb56a87501f3327adf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20443)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6970] Stream read allows skipping archived commits [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9905:
URL: https://github.com/apache/hudi/pull/9905#issuecomment-1774940259

   
   ## CI report:
   
   * 31be10290de4f6bbc9ecd385202ee9c1d655eac2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20444)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9904:
URL: https://github.com/apache/hudi/pull/9904#issuecomment-1774925985

   
   ## CI report:
   
   * 23af1b3753a523ffd717b7fb56a87501f3327adf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6970] Stream read allows skipping archived commits [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9905:
URL: https://github.com/apache/hudi/pull/9905#issuecomment-1774926097

   
   ## CI report:
   
   * 31be10290de4f6bbc9ecd385202ee9c1d655eac2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] oom [hudi]

2023-10-23 Thread via GitHub


zhuanshenbsj1 opened a new pull request, #9906:
URL: https://github.com/apache/hudi/pull/9906

   ### Change Logs
   
   1.When you set the conf read.start_commit as earliest, 
   
   
https://github.com/apache/hudi/blob/bb8fc3e9f632a1fc3647fda63d482849355df2b7/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java#L410-L428
   
   the method getInstantRange will return null, 
   
   
https://github.com/apache/hudi/blob/bb8fc3e9f632a1fc3647fda63d482849355df2b7/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/IncrementalInputSplits.java#L289-L298
   
   which will cause all partitions and files to be loaded subsequently, which 
is unreasonable.
   
   2.Due to developers being accustomed to consuming Kafka, they always prefer 
to set the consumption starting point to last test
   
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6970] Stream read allows skipping archived commits [hudi]

2023-10-23 Thread via GitHub


zhuanshenbsj1 opened a new pull request, #9905:
URL: https://github.com/apache/hudi/pull/9905

   ### Change Logs
   
   The current code version, if the commit has already been archived, will 
still be read. In most scenarios, cleaning is done before archiving (except for 
compaction), so it is generally not necessary to read the archived metadata.  
Moreover, if startcommit is set too early, it can load a large number of 
unnecessary commits, resulting in OOM.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6969] Add speed limit for stream read [hudi]

2023-10-23 Thread via GitHub


zhuanshenbsj1 opened a new pull request, #9904:
URL: https://github.com/apache/hudi/pull/9904

   ### Change Logs
   
   Currently, there is no speed limit for stream read, and regardless of the 
instantranges, they will be read at once. It is easy to cause GC of monitor 
operator.
   Add a configuration to limit the number of commits read per round in stream 
read mode.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6821] Support multiple base file formats in Hudi table [hudi]

2023-10-23 Thread via GitHub


codope commented on code in PR #9761:
URL: https://github.com/apache/hudi/pull/9761#discussion_r1368419186


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieMultiFileFormatRelation.scala:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.HoodieBaseRelation.projectReader
+import org.apache.hudi.HoodieConversionUtils.toScalaOption
+import org.apache.hudi.HoodieMultiFileFormatRelation.{createPartitionedFile, 
inferFileFormat}
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.{FileSlice, HoodieFileFormat, 
HoodieLogFile}
+import org.apache.hudi.common.table.HoodieTableMetaClient
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.execution.datasources.{FilePartition, 
PartitionedFile}
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.types.StructType
+
+import scala.jdk.CollectionConverters.asScalaIteratorConverter
+
+/**
+ * Base split for all Hoodie multi-file format relations.
+ */
+case class HoodieMultiFileFormatSplit(baseFile: Option[PartitionedFile],
+  logFiles: List[HoodieLogFile]) extends 
HoodieFileSplit
+
+/**
+ * Base relation to handle table with multiple base file formats.
+ */
+abstract class BaseHoodieMultiFileFormatRelation(override val sqlContext: 
SQLContext,
+ override val metaClient: 
HoodieTableMetaClient,

Review Comment:
   Discussed offline. We think that implementing a new `FileFormat` which works 
with multiple base file formats should be possible. So, i'm going to attempt 
that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6971) OOM caused by configuring read.start_commit as earliest in stream reading

2023-10-23 Thread zhuanshenbsj1 (Jira)
zhuanshenbsj1 created HUDI-6971:
---

 Summary: OOM caused by configuring read.start_commit as earliest 
in stream reading
 Key: HUDI-6971
 URL: https://issues.apache.org/jira/browse/HUDI-6971
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core
Reporter: zhuanshenbsj1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6970) Stream read allows skipping archived commits

2023-10-23 Thread zhuanshenbsj1 (Jira)
zhuanshenbsj1 created HUDI-6970:
---

 Summary: Stream read allows skipping archived commits
 Key: HUDI-6970
 URL: https://issues.apache.org/jira/browse/HUDI-6970
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core
Reporter: zhuanshenbsj1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6969) Add speed limit for stream read

2023-10-23 Thread zhuanshenbsj1 (Jira)
zhuanshenbsj1 created HUDI-6969:
---

 Summary: Add speed limit for stream read
 Key: HUDI-6969
 URL: https://issues.apache.org/jira/browse/HUDI-6969
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core
Reporter: zhuanshenbsj1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6821] Support multiple base file formats in Hudi table [hudi]

2023-10-23 Thread via GitHub


hudi-bot commented on PR #9761:
URL: https://github.com/apache/hudi/pull/9761#issuecomment-1774816678

   
   ## CI report:
   
   * 89e72e0fdf9229f34d23ee7245676eaa9a323418 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20440)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >