[jira] [Updated] (HUDI-7815) Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline

2024-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7815:
-
Labels: pull-request-available  (was: )

> Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh 
> timeline
> 
>
> Key: HUDI-7815
> URL: https://issues.apache.org/jira/browse/HUDI-7815
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7814) Exclude unused transitive dependencies that introduce vulnerabilities

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7814:
-
Labels: pull-request-available  (was: )

> Exclude unused transitive dependencies that introduce vulnerabilities
> -
>
> Key: HUDI-7814
> URL: https://issues.apache.org/jira/browse/HUDI-7814
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7812) Async Clustering w/ row writer fails due to timetravel query validation

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7812:
-
Labels: pull-request-available  (was: )

> Async Clustering w/ row writer fails due to timetravel query validation 
> 
>
> Key: HUDI-7812
> URL: https://issues.apache.org/jira/browse/HUDI-7812
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: clustering
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> With clustering row writer enabled flow, we trigger a time travel query to 
> read input records. But the query side fails if there are any pending commits 
> (due to new ingestion ) whose timestamp < clustering instant time. we need to 
> relax this constraint. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7810) Fix OptionsResolver#allowCommitOnEmptyBatch default value bug

2024-05-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7810:
-
Labels: pull-request-available  (was: )

> Fix OptionsResolver#allowCommitOnEmptyBatch default value bug
> -
>
> Key: HUDI-7810
> URL: https://issues.apache.org/jira/browse/HUDI-7810
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>
> Fix OptionsResolver#allowCommitOnEmptyBatch default value bug



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7808) Security upgrade io.acryl:datahub-client from 0.8.31 to 0.8.45

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7808:
-
Labels: pull-request-available  (was: )

> Security upgrade io.acryl:datahub-client from 0.8.31 to 0.8.45
> --
>
> Key: HUDI-7808
> URL: https://issues.apache.org/jira/browse/HUDI-7808
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7809) Use Spark SerializableConfiguration to avoid NPE in Kryo serde

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7809:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Use Spark SerializableConfiguration to avoid NPE in Kryo serde
> --
>
> Key: HUDI-7809
> URL: https://issues.apache.org/jira/browse/HUDI-7809
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> With Hudi 0.14.1, without 
> "spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar", Hudi 
> query in Spark quick start guide succeeds.  In Hudi 0.15.0-rc2, without the 
> Kryo registratrar, the Hudi read throws NPE due to HadoopStorageConfiguration.
> {code:java}
> Driver stacktrace:
>   at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2450)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2399)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2398)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2398)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1156)
>   at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1156)
>   at scala.Option.foreach(Option.scala:407)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1156)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2638)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2580)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2569)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2224)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2245)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2264)
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:492)
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:445)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
>   at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715)
>   at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728)
>   at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2728)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2935)
>   at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:326)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:806)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:765)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:774)
>   ... 47 elided
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.datasources.parquet.Spark32LegacyHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32LegacyHoodieParquetFileFormat.scala:152)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187)
>   at 
> org.apa

[jira] [Updated] (HUDI-7807) spark-sql updates for a pk less table fails w/ partitioned table

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7807:
-
Labels: pull-request-available  (was: )

> spark-sql updates for a pk less table fails w/ partitioned table 
> -
>
> Key: HUDI-7807
> URL: https://issues.apache.org/jira/browse/HUDI-7807
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> quick start fails when trying to UPDATE with spark-sql for a pk less table. 
>  
> {code:java}
>          > UPDATE hudi_table4 SET fare = 25.0 WHERE rider = 'rider-D';
> 24/05/28 11:44:41 WARN package: Truncated the string representation of a plan 
> since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
> 24/05/28 11:44:41 ERROR SparkSQLDriver: Failed in [UPDATE hudi_table4 SET 
> fare = 25.0 WHERE rider = 'rider-D']
> org.apache.hudi.exception.HoodieException: Unable to instantiate class 
> org.apache.hudi.keygen.SimpleKeyGenerator
>   at 
> org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75)
>   at 
> org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:123)
>   at 
> org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:91)
>   at 
> org.apache.hudi.util.SparkKeyGenUtils$.getPartitionColumns(SparkKeyGenUtils.scala:47)
>   at 
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:218)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:232)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125)
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNod

[jira] [Updated] (HUDI-7805) FileSystemBasedLockProvider need delete lock file auto when occur lock conflict to avoid next write failed

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7805:
-
Labels: pull-request-available  (was: )

> FileSystemBasedLockProvider need delete lock file auto when occur lock 
> conflict to avoid next write failed
> --
>
> Key: HUDI-7805
> URL: https://issues.apache.org/jira/browse/HUDI-7805
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: multi-writer
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock 
> object hdfs://aa-region/region04/2211/warehouse/hudi/odsmon_log/.hoodie/lock
>   at 
> org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:100)
>   at 
> org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1258)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1301)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139)
>   at 
> org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:216)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:396)
>   at 
> org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108)
>   at 
> org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
>   at 
> org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
>   at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:219)
>   at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSess

[jira] [Updated] (HUDI-7804) Improve flink bucket index partitioner

2024-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7804:
-
Labels: pull-request-available  (was: )

> Improve flink bucket index partitioner
> --
>
> Key: HUDI-7804
> URL: https://issues.apache.org/jira/browse/HUDI-7804
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/hudi/issues/11288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7507) ongoing concurrent writers with smaller timestamp can cause issues with table services

2024-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7507:
-
Labels: pull-request-available  (was: )

>  ongoing concurrent writers with smaller timestamp can cause issues with 
> table services
> ---
>
> Key: HUDI-7507
> URL: https://issues.apache.org/jira/browse/HUDI-7507
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Krishen Bhan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
> Attachments: Flowchart (1).png, Flowchart.png
>
>
> *Scenarios:*
> Although HUDI operations hold a table lock when creating a .requested 
> instant, because HUDI writers do not generate a timestamp and create a 
> .requsted plan in the same transaction, there can be a scenario where 
>  # Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp 
> (x - 1)
>  # Job 1 schedules and creates requested file with instant timestamp (x)
>  # Job 2 schedules and creates requested file with instant timestamp (x-1)
>  # Both jobs continue running
> If one job is writing a commit and the other is a table service, this can 
> cause issues:
>  * 
>  ** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then 
> when Job 1 runs before Job 2 and can create a compaction plan for all instant 
> times (up to (x) ) that doesn’t include instant time (x-1) .  Later Job 2 
> will create instant time (x-1), but timeline will be in a corrupted state 
> since compaction plan was supposed to include (x-1)
>  ** There is a similar issue with clean. If Job2 is a long-running commit 
> (that was stuck/delayed for a while before creating its .requested plan) and 
> Job 1 is a clean, then Job 1 can perform a clean that updates the 
> earliest-commit-to-retain without waiting for the inflight instant by Job 2 
> at (x-1) to complete. This causes Job2 to be "skipped" by clean.
>  ** If the completed commit files include som sort of "checkpointing" with 
> another "downstream job" performing incremental reads on this dataset (such 
> as Hoodie Streamer/DeltaSync) then there may be incorrect behavior, such as 
> the incremental reader skipping some completed commits (that have a smaller 
> instant timestamp than latest completed commit but were created after).
> [Edit] I added a diagram to visualize the issue, specifically the second 
> scenario with clean
> !Flowchart (1).png!
> *Proposed approach:*
> One way this can be resolved is by combining the operations of generating 
> instant time and creating a requested file in the same HUDI table 
> transaction. Specifically, executing the following steps whenever any instant 
> (commit, table service, etc) is scheduled
> Approach A
>  # Acquire table lock
>  # Look at the latest instant C on the active timeline (completed or not). 
> Generate a timestamp after C
>  # Create the plan and requested file using this new timestamp ( that is 
> greater than C)
>  # Release table lock
> Unfortunately (A) has the following drawbacks
>  * Every operation must now hold the table lock when computing its plan even 
> if it's an expensive operation and will take a while
>  * Users of HUDI cannot easily set their own instant time of an operation, 
> and this restriction would break any public APIs that allow this and would 
> require deprecating those APIs.
>  
> An alternate approach is to have every operation abort creating a .requested 
> file unless it has the latest timestamp. Specifically, for any instant type, 
> whenever an operation is about to create a .requested plan on timeline, it 
> should take the table lock and assert that there are no other instants on 
> timeline that are greater than it that could cause a conflict. If that 
> assertion fails, then throw a retry-able conflict resolution exception.
> Specifically, the following steps should be followed whenever any instant 
> (commit, table service, etc) is scheduled
> Approach B
>  # Acquire table lock. Assume that the desired instant time C and requested 
> file plan metadata have already been created, regardless of wether it was 
> before this step or right after acquiring the table lock.
>  # If there are any instants on the timeline that are greater than C 
> (regardless of their operation type or sate status) then release table lock 
> and throw an exception
>  # Create requested plan on timeline (As usual)
>  # Release table lock
> Unlike (A), thi

[jira] [Updated] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete

2024-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7655:
-
Labels: clean pull-request-available  (was: clean)

> Support configuration for clean to fail execution if there is at least one 
> file is marked as a failed delete
> 
>
> Key: HUDI-7655
> URL: https://issues.apache.org/jira/browse/HUDI-7655
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Krishen Bhan
>Assignee: sivabalan narayanan
>Priority: Minor
>  Labels: clean, pull-request-available
>
> When a HUDI clean plan is executed, any targeted file that was not confirmed 
> as deleted (or non-existing) will be marked as a "failed delete". Although 
> these failed deletes will be added to `.clean` metadata, if incremental clean 
> is used then these files might not ever be picked up again as a future clean 
> plan, unless a "full-scan" clean ends up being scheduled. In addition to 
> leading to more files unnecessarily taking up storage space for longer, then 
> can lead to the following dataset consistency issue for COW datasets:
>  # Insert at C1 creates file group f1 in partition
>  # Replacecommit at RC2 creates file group f2 in partition, and replaces f1
>  # Any reader of partition that calls HUDI API (with or without using MDT) 
> will recognize that f1 should be ignored, as it has been replaced. This is 
> since RC2 instant file is in active timeline
>  # Some completed instants later an incremental clean is scheduled. It moves 
> the "earliest commit to retain" to an time after instant time RC2, so it 
> targets f1 for deletion. But during execution of the plan, it fails to delete 
> f1.
>  # An archive job eventually is triggered, and archives C1 and RC2. Note that 
> f1 is still in partition
> At this point, any job/query that reads the aforementioned partition directly 
> from the DFS file system calls (without directly using MDT FILES partition) 
> will consider both f1 and f2 as valid file groups, since RC2 is no longer in 
> active timeline. This is a data consistency issue, and will only be resolved 
> if a "full-scan" clean is triggered and deletes f1.
> This specific scenario can be avoided if the user can configure HUDI clean to 
> fail execution of a clean plan unless all files are confirmed as deleted (or 
> not existing in DFS already), "blocking" the clean. The next clean attempt 
> will re-execute this existing plan, since clean plans cannot be "rolled 
> back". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7802) Fix bundle validation scripts

2024-05-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7802:
-
Labels: pull-request-available  (was: )

> Fix bundle validation scripts
> -
>
> Key: HUDI-7802
> URL: https://issues.apache.org/jira/browse/HUDI-7802
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Issues:
>  * Bundle validation with packaging/bundle-validation/ci_run.sh fails for 
> release-0.15.0 branch due to script issue
>  * scripts/release/validate_staged_bundles.sh needs to include additional 
> bundles.
>  * Add release candidate validation on scala 2.13 bundles.
>  * Disable release candidate validation by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7801) Directly pass down HoodieStorage instance instead of recreation

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7801:
-
Labels: pull-request-available  (was: )

> Directly pass down HoodieStorage instance instead of recreation
> ---
>
> Key: HUDI-7801
> URL: https://issues.apache.org/jira/browse/HUDI-7801
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> There are places that use HoodieStorage#newInstance to recreate HoodieStorage 
> instance which may not be necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7799) Optimize the access modifier of AbstractHoodieLogRecordReader#processNextRecord

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7799:
-
Labels: pull-request-available  (was: )

> Optimize the access modifier of 
> AbstractHoodieLogRecordReader#processNextRecord
> ---
>
> Key: HUDI-7799
> URL: https://issues.apache.org/jira/browse/HUDI-7799
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: bradley
>Priority: Major
>  Labels: pull-request-available
>
> Correct the access modifier of the processNextRecord member method of the 
> Scanner class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7798) Mark configs included in 0.15.0 release

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7798:
-
Labels: pull-request-available  (was: )

> Mark configs included in 0.15.0 release
> ---
>
> Key: HUDI-7798
> URL: https://issues.apache.org/jira/browse/HUDI-7798
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> We need to mark the configs that go out in 0.15.0 release with 
> `.sinceVersion("0.15.0")`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7797) Use HoodieIOFactory to return pluggable FileFormatUtils implementation

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7797:
-
Labels: pull-request-available  (was: )

> Use HoodieIOFactory to return pluggable FileFormatUtils implementation
> --
>
> Key: HUDI-7797
> URL: https://issues.apache.org/jira/browse/HUDI-7797
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch dependabot/maven/org.apache.hive-hive-service-2.3.3 deleted (was e7e4b9e3ddf)

2024-05-25 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.apache.hive-hive-service-2.3.3
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was e7e4b9e3ddf Update pom.xml

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[jira] [Updated] (HUDI-7796) Gracefully cast file system instance in Avro writers

2024-05-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7796:
-
Labels: pull-request-available  (was: )

> Gracefully cast file system instance in Avro writers
> 
>
> Key: HUDI-7796
> URL: https://issues.apache.org/jira/browse/HUDI-7796
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>
> When running tests in Trino with Hudi MDT enabled, the following line in 
> HoodieAvroHFileWriter throws class cast exception
> {code:java}
>     this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7794) Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7794:
-
Labels: pull-request-available  (was: )

> Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3
> -
>
> Key: HUDI-7794
> URL: https://issues.apache.org/jira/browse/HUDI-7794
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7795) Fix loading of input splits from look up table reader

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7795:
-
Labels: pull-request-available  (was: )

> Fix loading of input splits from look up table reader
> -
>
> Key: HUDI-7795
> URL: https://issues.apache.org/jira/browse/HUDI-7795
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 deleted (was 1560e9aa30c)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was 1560e9aa30c Bump h2 in /hudi-platform-service/hudi-metaserver

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(hudi) branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 deleted (was cb331f02ad6)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was cb331f02ad6 Bump h2 in /packaging/hudi-metaserver-server-bundle

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[jira] [Updated] (HUDI-7792) Bump h2 from 1.4.200 to 2.2.220 in /hudi-platform-service/hudi-metaserver

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7792:
-
Labels: pull-request-available  (was: )

> Bump h2 from 1.4.200 to 2.2.220 in /hudi-platform-service/hudi-metaserver
> -
>
> Key: HUDI-7792
> URL: https://issues.apache.org/jira/browse/HUDI-7792
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7791) Bump h2 from 1.4.200 to 2.2.220 in /packaging/hudi-metaserver-server-bundle

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7791:
-
Labels: pull-request-available  (was: )

> Bump h2 from 1.4.200 to 2.2.220 in /packaging/hudi-metaserver-server-bundle
> ---
>
> Key: HUDI-7791
> URL: https://issues.apache.org/jira/browse/HUDI-7791
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was ac3ae4a66fa)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was ac3ae4a66fa Merge branch 'master' into 
dependabot/maven/hive.version-3.1.2

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2

No new revisions were added by this update.



(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was 14e3d559bd0)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2

No new revisions were added by this update.



(hudi) branch dependabot/maven/org.apache.hive-hive-service-2.3.3 created (now da65366d649)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.apache.hive-hive-service-2.3.3
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at da65366d649 Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3

No new revisions were added by this update.



(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was 14e3d559bd0)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


 was 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2
in repository https://gitbox.apache.org/repos/asf/hudi.git


  at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2

No new revisions were added by this update.



(hudi) branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 updated (e96f60a4406 -> 1560e9aa30c)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220
in repository https://gitbox.apache.org/repos/asf/hudi.git


 discard e96f60a4406 Bump h2 in /hudi-platform-service/hudi-metaserver
 add ddaef8feddb [HUDI-5101] Adding spark-structured streaming test support 
via spark-submit job (#7074)
 add e2dfb465f13 [HUDI-7495] Bump mysql-connector-java from 8.0.22 to 
8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7674)
 add e6664159bed [HUDI-7163] Fix not parsable text DateTimeParseException 
when compact (#10220)
 add 3698d49383b [HUDI-7496] Bump mybatis from 3.4.6 to 3.5.6 in 
/hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7673)
 add 819788f8651 [MINOR] Remove repetitive words in docs (#10844)
 add ee11b9c951c [HUDI-7489] Avoid collecting WriteStatus to driver in row 
writer code path (#10836)
 add 130498708bb add job context (#10848)
 add 8bc9a4bc875 [HUDI-7478] Fix max delta commits guard check w/ MDT 
(#10820)
 add f8f12ba9ef3 [MINOR] Fix and enable test 
TestHoodieDeltaStreamer.testJdbcSourceIncrementalFetchInContinuousMode (#10867)
 add aac6b2e5486 [HUDI-7382] Get partitions from active timeline instead of 
listing when building clustering plan (#10621)
 add ca2140e2003 [MINOR] rename KeyGenUtils#enableAutoGenerateRecordKeys 
(#10871)
 add 3c8488b831c [HUDI-7506] Compute offsetRanges based on 
eventsPerPartition allocated in each range (#10869)
 add e726306cf09 [HUDI-7466] Add parallel listing of existing partitions in 
Glue Catalog sync (#10460)
 add 2dcdd311245 [HUDI-7421] Build HoodieDeltaWriteStat using 
HoodieDeltaWriteStat#copy (#10870)
 add b7ccecf3205 [HUDI-7492] Fix the incorrect keygenerator specification 
for multi partition or multi primary key tables creation (#10840)
 add 7631e0dcb89 [MINOR] Add Hudi icon for idea (#10880)
 add 784af0e1786 [HUDI-7514] Update Manifest file after the parquet writer 
closed in LSMTimelineWriter (#10883)
 add 7c55ac35ba1 [HUDI-7516] Put jdbc-h2 creds into static variables for 
hudi-utilities tests (#10889)
 add 135db099afc [MINOR] Remove redundant fileId from HoodieAppendHandle 
(#10901)
 add 5a21a1dd260 [HUDI-7529] Resolve hotspots in stream read  (#10911)
 add 47151f653d8 [HUDI-7487] Fixed test with in-memory index by proper heap 
clearing (#10910)
 add 6be7205a1e3 [MINOR] Refactored `@Before*` and `@After*` in 
`HoodieDeltaStreamerTestBase` (#10912)
 add a8e9db446c3 [HUDI-7530] Refactoring of handleUpdateInternal in 
CommitActionExecutors and HoodieTables (#10908)
 add f98a40bd369 [HUDI-7499] Support FirstValueAvroPayload  for Hudi 
(#10857)
 add da9660bf38a checkstyle (#10919)
 add d749457f9d5 [HUDI-7513] Add jackson-module-scala to spark bundle 
(#10877)
 add d22bfba08fd [MINOR] Restore the setMaxParallelism setting for 
HoodieTableSource.produceDataStream (#10925)
 add 5e4a6c650e0 [HUDI-7531] Consider pending clustering when scheduling a 
new clustering plan (#10923)
 add 8a137631da8 [HUDI-7518] Fix HoodieMetadataPayload merging logic around 
repeated deletes (#10913)
 add 136d0755ad7 [HUDI-7500] fix gaps with deduce schema and null schema 
(#10858)
 add 28f67ff3561 [HUDI-7551] Avoid loading all partitions in CleanPlanner 
when MDT is enabled (#10928)
 add 4741ba06462 [HUDI-6317] Streaming read should skip compaction and 
clustering instants to avoid duplicates (#8884)
 add ae4f46874a9 [MINOR} When M3 metrics reporter type is used 
HoodieMetricsConfig should create default values for HoodieMetricsM3Config 
(#10936)
 add 06d3bb8cfbd [HUDI-6884] hudi-cli should generate correct 
HoodieTimeGeneratorConfig (#10941)
 add 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer 
properties (#10866)
 add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback 
check for MDT (#10874)
 add 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class 
(#10938)
 add 9efced37f81 [HUDI-7557] Fix incremental cleaner when commit for 
savepoint removed (#10946)
 add bb51aca75d0 [MINOR] Upgrade mockito to 3.12.4 (#10953)
 add 8bb6bee6234 [HUDI-7564] Fix HiveSyncConfig inconsistency (#10951)
 add 59e32b7e686 [HUDI-7569] [RLI] Fix wrong result generated by query 
(#10955)
 add bf723f56cd0 [HUDI-7486] Classify schema exceptions when converting 
from avro to spark row representation (#10778)
 add 398c9a23c84 [HUDI-7564] Revert hive sync inconsistency and reason for 
it (#10959)
 add 8b61696f158 [HUDI-7556] Fixing MDT validator and adding tests (#10939)
 add 19c20e4dd93 [HUDI-7571] Add api to get exception details in 
HoodieMetadataTableValidator with ignoreFailed mode (#10960)
 add bac6ea7b26b [MINOR] Removed FSUtils.makeBaseFileName without fileExt 
param (#10963)
 add d41541cb9f8

(hudi) branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 updated (b10b98e5d8f -> cb331f02ad6)

2024-05-24 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220
in repository https://gitbox.apache.org/repos/asf/hudi.git


 discard b10b98e5d8f Bump h2 in /packaging/hudi-metaserver-server-bundle
 add ddaef8feddb [HUDI-5101] Adding spark-structured streaming test support 
via spark-submit job (#7074)
 add e2dfb465f13 [HUDI-7495] Bump mysql-connector-java from 8.0.22 to 
8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7674)
 add e6664159bed [HUDI-7163] Fix not parsable text DateTimeParseException 
when compact (#10220)
 add 3698d49383b [HUDI-7496] Bump mybatis from 3.4.6 to 3.5.6 in 
/hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7673)
 add 819788f8651 [MINOR] Remove repetitive words in docs (#10844)
 add ee11b9c951c [HUDI-7489] Avoid collecting WriteStatus to driver in row 
writer code path (#10836)
 add 130498708bb add job context (#10848)
 add 8bc9a4bc875 [HUDI-7478] Fix max delta commits guard check w/ MDT 
(#10820)
 add f8f12ba9ef3 [MINOR] Fix and enable test 
TestHoodieDeltaStreamer.testJdbcSourceIncrementalFetchInContinuousMode (#10867)
 add aac6b2e5486 [HUDI-7382] Get partitions from active timeline instead of 
listing when building clustering plan (#10621)
 add ca2140e2003 [MINOR] rename KeyGenUtils#enableAutoGenerateRecordKeys 
(#10871)
 add 3c8488b831c [HUDI-7506] Compute offsetRanges based on 
eventsPerPartition allocated in each range (#10869)
 add e726306cf09 [HUDI-7466] Add parallel listing of existing partitions in 
Glue Catalog sync (#10460)
 add 2dcdd311245 [HUDI-7421] Build HoodieDeltaWriteStat using 
HoodieDeltaWriteStat#copy (#10870)
 add b7ccecf3205 [HUDI-7492] Fix the incorrect keygenerator specification 
for multi partition or multi primary key tables creation (#10840)
 add 7631e0dcb89 [MINOR] Add Hudi icon for idea (#10880)
 add 784af0e1786 [HUDI-7514] Update Manifest file after the parquet writer 
closed in LSMTimelineWriter (#10883)
 add 7c55ac35ba1 [HUDI-7516] Put jdbc-h2 creds into static variables for 
hudi-utilities tests (#10889)
 add 135db099afc [MINOR] Remove redundant fileId from HoodieAppendHandle 
(#10901)
 add 5a21a1dd260 [HUDI-7529] Resolve hotspots in stream read  (#10911)
 add 47151f653d8 [HUDI-7487] Fixed test with in-memory index by proper heap 
clearing (#10910)
 add 6be7205a1e3 [MINOR] Refactored `@Before*` and `@After*` in 
`HoodieDeltaStreamerTestBase` (#10912)
 add a8e9db446c3 [HUDI-7530] Refactoring of handleUpdateInternal in 
CommitActionExecutors and HoodieTables (#10908)
 add f98a40bd369 [HUDI-7499] Support FirstValueAvroPayload  for Hudi 
(#10857)
 add da9660bf38a checkstyle (#10919)
 add d749457f9d5 [HUDI-7513] Add jackson-module-scala to spark bundle 
(#10877)
 add d22bfba08fd [MINOR] Restore the setMaxParallelism setting for 
HoodieTableSource.produceDataStream (#10925)
 add 5e4a6c650e0 [HUDI-7531] Consider pending clustering when scheduling a 
new clustering plan (#10923)
 add 8a137631da8 [HUDI-7518] Fix HoodieMetadataPayload merging logic around 
repeated deletes (#10913)
 add 136d0755ad7 [HUDI-7500] fix gaps with deduce schema and null schema 
(#10858)
 add 28f67ff3561 [HUDI-7551] Avoid loading all partitions in CleanPlanner 
when MDT is enabled (#10928)
 add 4741ba06462 [HUDI-6317] Streaming read should skip compaction and 
clustering instants to avoid duplicates (#8884)
 add ae4f46874a9 [MINOR} When M3 metrics reporter type is used 
HoodieMetricsConfig should create default values for HoodieMetricsM3Config 
(#10936)
 add 06d3bb8cfbd [HUDI-6884] hudi-cli should generate correct 
HoodieTimeGeneratorConfig (#10941)
 add 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer 
properties (#10866)
 add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback 
check for MDT (#10874)
 add 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class 
(#10938)
 add 9efced37f81 [HUDI-7557] Fix incremental cleaner when commit for 
savepoint removed (#10946)
 add bb51aca75d0 [MINOR] Upgrade mockito to 3.12.4 (#10953)
 add 8bb6bee6234 [HUDI-7564] Fix HiveSyncConfig inconsistency (#10951)
 add 59e32b7e686 [HUDI-7569] [RLI] Fix wrong result generated by query 
(#10955)
 add bf723f56cd0 [HUDI-7486] Classify schema exceptions when converting 
from avro to spark row representation (#10778)
 add 398c9a23c84 [HUDI-7564] Revert hive sync inconsistency and reason for 
it (#10959)
 add 8b61696f158 [HUDI-7556] Fixing MDT validator and adding tests (#10939)
 add 19c20e4dd93 [HUDI-7571] Add api to get exception details in 
HoodieMetadataTableValidator with ignoreFailed mode (#10960)
 add bac6ea7b26b [MINOR] Removed FSUtils.makeBaseFileName without fileExt 
param (#10963)
 add d41541cb9f8

[jira] [Updated] (HUDI-7790) Revert changes in DFSPathSelector and UtilHelpers.readConfig

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7790:
-
Labels: pull-request-available  (was: )

> Revert changes in DFSPathSelector and UtilHelpers.readConfig
> 
>
> Key: HUDI-7790
> URL: https://issues.apache.org/jira/browse/HUDI-7790
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> This is to avoid behavior changes in DFSPathSelector and keep the 
> UtilHelpers.readConfig API the same as before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7788) Fixing exception handling in AverageRecordSizeUtils

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7788:
-
Labels: pull-request-available  (was: )

> Fixing exception handling in AverageRecordSizeUtils
> ---
>
> Key: HUDI-7788
> URL: https://issues.apache.org/jira/browse/HUDI-7788
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> We should catch Throwable to avoid any issue during record size estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7787) Reload the data for lookup table when found the newer commit instance

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7787:
-
Labels: pull-request-available  (was: )

> Reload the data for lookup table when found the newer commit instance
> -
>
> Key: HUDI-7787
> URL: https://issues.apache.org/jira/browse/HUDI-7787
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: hehuiyuan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7786) Fix roaring bitmap dependency in hudi-integ-test-bundle

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7786:
-
Labels: pull-request-available  (was: )

> Fix roaring bitmap dependency in hudi-integ-test-bundle
> ---
>
> Key: HUDI-7786
> URL: https://issues.apache.org/jira/browse/HUDI-7786
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7785) Keep public APIs in utilities module the same as before HoodieStorage abstraction

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7785:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Keep public APIs in utilities module the same as before HoodieStorage 
> abstraction
> -
>
> Key: HUDI-7785
> URL: https://issues.apache.org/jira/browse/HUDI-7785
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> BaseErrorTableWriter, HoodieStreamer, StreamSync, etc., are public API 
> classes and contain public API methods, which should be kept the same as 
> before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4491) Re-enable TestHoodieFlinkQuickstart

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4491:
-
Labels: pull-request-available  (was: )

> Re-enable TestHoodieFlinkQuickstart 
> 
>
> Key: HUDI-4491
> URL: https://issues.apache.org/jira/browse/HUDI-4491
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Shawn Chang
>Priority: Major
>  Labels: pull-request-available
>
> This test was disabled before due to its flakiness. We need to re-enable it 
> again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7784) Fix serde of HoodieHadoopConfiguration in Spark

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7784:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Fix serde of HoodieHadoopConfiguration in Spark
> ---
>
> Key: HUDI-7784
> URL: https://issues.apache.org/jira/browse/HUDI-7784
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7783) Fix connection leak in FileSystemBasedLockProvider

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7783:
-
Labels: pull-request-available  (was: )

> Fix connection leak in FileSystemBasedLockProvider
> --
>
> Key: HUDI-7783
> URL: https://issues.apache.org/jira/browse/HUDI-7783
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> Fix connection leak in FileSystemBasedLockProvider



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7777) Allow HoodieTableMetaClient to take HoodieStorage instance directly

2024-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

>  Allow HoodieTableMetaClient to take HoodieStorage instance directly
> 
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> We need to functionality for the meta client to 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7774) MercifulJsonConvertor should support Avro logical type

2024-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7774:
-
Labels: pull-request-available  (was: )

> MercifulJsonConvertor should support Avro logical type
> --
>
> Key: HUDI-7774
> URL: https://issues.apache.org/jira/browse/HUDI-7774
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Davis Zhang
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> MercifulJsonConverter should be able to convert raw json string entries to 
> Avro GenericRecord whose format is compliant with the required avro schema.
>  
> The list of conversion we should support with input:
>  * UUID: String
>  * Decimal: Number, Number with String representation
>  * Date: Either Number / String Number or human readable timestamp in 
> DateTimeFormatter.ISO_LOCAL_DATE format
>  * Time (milli/micro sec): Number / String Number or human readable timestamp 
> in 
> DateTimeFormatter.ISO_LOCAL_TIME format
>  * Timestamp (milli/micro second): Number / String Number or human readable 
> timestamp in DateTimeFormatter.ISO_INSTANT format
>  * Local Timestamp (milli/micro second): Number / String Number or human 
> readable timestamp in DateTimeFormatter.ISO_LOCAL_DATE_TIME format



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7781) Filter wrong partitions when using hoodie.datasource.write.partitions.to.delete

2024-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7781:
-
Labels: pull-request-available  (was: )

> Filter wrong partitions when using 
> hoodie.datasource.write.partitions.to.delete
> ---
>
> Key: HUDI-7781
> URL: https://issues.apache.org/jira/browse/HUDI-7781
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7776) Simplify HoodieStorage instance fetching

2024-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7776:
-
Labels: pull-request-available  (was: )

> Simplify HoodieStorage instance fetching
> 
>
> Key: HUDI-7776
> URL: https://issues.apache.org/jira/browse/HUDI-7776
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7778) Duplicate Key exception with RLI

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7778:
-
Labels: pull-request-available  (was: )

> Duplicate Key exception with RLI 
> -
>
> Key: HUDI-7778
> URL: https://issues.apache.org/jira/browse/HUDI-7778
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> We are occasionally hitting an exception as below meaning, two records are 
> ingested to RLI for the same record key from data table. This is not expected 
> to happen. 
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
> appending records to 
> file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476
>  at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
>  at 
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439)  
> at 
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
>  at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355)
>   ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: 
> Writing multiple records with same key 1 not supported for 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146)
>   at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121)
>  at 
> org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166)
>   at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467)
>  ... 31 more
> Driver stacktrace:51301 [main] INFO  org.apache.spark.scheduler.DAGScheduler 
> [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 
> [main] INFO  org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline 
> service !!51303 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline 
> server51303 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] 
> - Closing Timeline Service51321 [main] INFO  
> org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline 
> Service51321 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline 
> server
> org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
> time 197001012
>   at 
> org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80)
>at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47)
>   at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98)
> at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
>   at 
> org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
>  

[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7775:
-
Labels: pull-request-available  (was: )

> Remove unused APIs in HoodieStorage
> ---
>
> Key: HUDI-7775
> URL: https://issues.apache.org/jira/browse/HUDI-7775
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7761) Make the manifest Writer Extendable

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7761:
-
Labels: pull-request-available  (was: )

> Make the manifest Writer Extendable
> ---
>
> Key: HUDI-7761
> URL: https://issues.apache.org/jira/browse/HUDI-7761
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sivaguru Kannan
>Priority: Major
>  Labels: pull-request-available
>
> * Make the manifest writer extendable such that clients can plugin in the 
> custom instance of manifest writer for their syncs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5505:
-
Labels: pull-request-available  (was: )

> Compaction NUM_COMMITS policy should only judge completed deltacommit
> -
>
> Key: HUDI-5505
> URL: https://issues.apache.org/jira/browse/HUDI-5505
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: compaction, table-service
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-01-05-13-10-57-918.png
>
>
> `compaction.delta_commits =1`
>  
> {code:java}
> 20230105115229301.deltacommit
> 20230105115229301.deltacommit.inflight
> 20230105115229301.deltacommit.requested
> 20230105115253118.commit
> 20230105115253118.compaction.inflight
> 20230105115253118.compaction.requested
> 20230105115330994.deltacommit.inflight
> 20230105115330994.deltacommit.requested{code}
> The return result of `ScheduleCompactionActionExecutor.needCompact ` is 
> `true`, 
> This should not be expected.
>  
> And In the `Occ` or `lazy clean` mode,this will cause compaction trigger 
> early.
> `compaction.delta_commits =3`
>  
> {code:java}
> 20230105125650541.deltacommit.inflight
> 20230105125650541.deltacommit.requested
> 20230105125715081.deltacommit
> 20230105125715081.deltacommit.inflight
> 20230105125715081.deltacommit.requested
> 20230105130018070.deltacommit.inflight
> 20230105130018070.deltacommit.requested {code}
>  
> And compaction will be trigger, this should not be expected.
> !image-2023-01-05-13-10-57-918.png|width=699,height=158!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7770) Bootstrap read tries to parse partition from the bootstrap base path

2024-05-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7770:
-
Labels: pull-request-available  (was: )

> Bootstrap read tries to parse partition from the bootstrap base path
> 
>
> Key: HUDI-7770
> URL: https://issues.apache.org/jira/browse/HUDI-7770
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap, spark
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
>
> Bootstrap gets the partition path values from the bootstrap base path when 
> reading the base file but from the hudi table in all other cases. Just use 
> the hudi path in all cases to keep partition parsing more simple



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7772) HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting archiving of instants

2024-05-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7772:
-
Labels: pull-request-available  (was: )

> HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting 
> archiving of instants
> ---
>
> Key: HUDI-7772
> URL: https://issues.apache.org/jira/browse/HUDI-7772
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> When user alter table by adding a column then insert new data to the table 
> with set mdt enable, would error out with follow, from the stack we find that 
> FileSystemBackedTableMetadata not support it.
> org.apache.hudi.exception.HoodieException: Error limiting instant archival 
> based on metadata table
>   at 
> org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:522)
>   at 
> org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:167)
>   at 
> org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:791)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:890)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:619)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:585)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:248)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1020)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:405)
>   at 
> org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108)
>   at 
> org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transfo

[jira] [Updated] (HUDI-7769) Fix Hudi CDC read on Spark 3.3.4 and 3.4.3

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7769:
-
Labels: pull-request-available  (was: )

> Fix Hudi CDC read on Spark 3.3.4 and 3.4.3
> --
>
> Key: HUDI-7769
> URL: https://issues.apache.org/jira/browse/HUDI-7769
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7771) Make default hoodie record payload as OverwriteWithLatestPayload for 0.15.0

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7771:
-
Labels: pull-request-available  (was: )

> Make default hoodie record payload as OverwriteWithLatestPayload for 0.15.0
> ---
>
> Key: HUDI-7771
> URL: https://issues.apache.org/jira/browse/HUDI-7771
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> We made "DefaultHoodieRecordPayload" as default for 1.x. but lets keep it as 
> OverwriteWithLatestAvroPayload for 0.15.10 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7767) Revert Spark 3.3 and 3.4 upgrades

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7767:
-
Labels: pull-request-available  (was: )

> Revert Spark 3.3 and 3.4 upgrades 
> --
>
> Key: HUDI-7767
> URL: https://issues.apache.org/jira/browse/HUDI-7767
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7766) Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7766:
-
Labels: pull-request-available  (was: )

> Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile
> --
>
> Key: HUDI-7766
> URL: https://issues.apache.org/jira/browse/HUDI-7766
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7765) Turn off native HFile reader for 0.15.0 release

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7765:
-
Labels: pull-request-available  (was: )

> Turn off native HFile reader for 0.15.0 release
> ---
>
> Key: HUDI-7765
> URL: https://issues.apache.org/jira/browse/HUDI-7765
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7768) Fix failing tests for 0.15.0 release (async compaction and metadata num commits check)

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7768:
-
Labels: pull-request-available  (was: )

> Fix failing tests for 0.15.0 release (async compaction and metadata num 
> commits check)
> --
>
> Key: HUDI-7768
> URL: https://issues.apache.org/jira/browse/HUDI-7768
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
>  
>  
> |[https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=23953=logs=600e7de6-e133-5e69-e615-50ee129b3c08=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7]
> [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=23953=logs=7601efb9-4019-552e-11ba-eb31b66593b2=d4b4e11d-8e26-50e5-a0d9-bb2d5decfeb9]
> org.apache.hudi.exception.HoodieMetadataException: Metadata table's 
> deltacommits exceeded 3: this is likely caused by a pending instant in the 
> data table. Resolve the pending instant or adjust 
> `hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline. 
> at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.checkNumDeltaCommits([HoodieBackedTableMetadataWriter.java:835|http://hoodiebackedtablemetadatawriter.java:835/])
>  at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateTimelineBeforeSchedulingCompaction([HoodieBackedTableMetadataWriter.java:1367|http://hoodiebackedtablemetadatawriter.java:1367/])
> java.lang.IllegalArgumentException: Following instants have timestamps >= 
> compactionInstant (002) Instants 
> :[[004__deltacommit__COMPLETED__20240515123806398]] at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument([ValidationUtils.java:42|http://validationutils.java:42/])
>  at 
> org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.execute([ScheduleCompactionActionExecutor.java:108|http://schedulecompactionactionexecutor.java:108/])
> |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7764) DefaultHoodieRecordPayload should be projection compatible

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7764:
-
Labels: pull-request-available  (was: )

> DefaultHoodieRecordPayload should be projection compatible
> --
>
> Key: HUDI-7764
> URL: https://issues.apache.org/jira/browse/HUDI-7764
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> DefaultHoodieRecordPayload is not listed as projection compatible. Therefore, 
> with relation reader we end up reading all the columns for mor reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7763) Fix that jmx reporter cannot initialized if metadata enables

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7763:
-
Labels: metrics pull-request-available  (was: metrics)

> Fix that jmx reporter cannot initialized if metadata enables
> 
>
> Key: HUDI-7763
> URL: https://issues.apache.org/jira/browse/HUDI-7763
> Project: Apache Hudi
>  Issue Type: Bug
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>  Labels: metrics, pull-request-available
>
> If the jmx metric option is activated, port settings can be set to range.
>  
> Because metadata is also written as hoodie table, requires multiple metric 
> instances. (If not, occur exception 'ObjID already in use')
> JmxReporterServer can only use one port each.
> So, jmx server might be able to be initialized on multiple ports.
>  
> error log:
>  ( jmx reporter for metadata is initialized first, then reporter for data 
> occurs exception )
> {code:java}
> 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Loading 
> HoodieTableMetaClient from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc
> 24/05/13 20:28:27 INFO table.HoodieTableConfig: Loading table properties from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc/.hoodie/hoodie.properties
> 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Finished Loading Table of 
> type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc
> 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Loading 
> HoodieTableMetaClient from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc
> 24/05/13 20:28:27 INFO table.HoodieTableConfig: Loading table properties from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc/.hoodie/hoodie.properties
> 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Finished Loading Table of 
> type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
> /data/feeder/affiliate/book/affiliate_feeder_book_svc
> 24/05/13 20:28:27 INFO timeline.HoodieActiveTimeline: Loaded instants upto : 
> Option{val=[==>20240513195519782__deltacommit__REQUESTED__20240513195521160]}
> 24/05/13 20:28:28 INFO config.HoodieWriteConfig: Automatically set 
> hoodie.cleaner.policy.failed.writes=LAZY since optimistic concurrency control 
> is used
> 24/05/13 20:28:28 INFO metrics.JmxMetricsReporter: Started JMX server on port 
> 9889.
> 24/05/13 20:28:28 INFO metrics.JmxMetricsReporter: Configured JMXReporter 
> with {port:9889}
> 24/05/13 20:28:28 INFO embedded.EmbeddedTimelineService: Overriding hostIp to 
> (feeder-affiliate-book-svc-sink-09c3c08f71b47a5d-driver-svc.csp.svc) found in 
> spark-conf. It was null
> 24/05/13 20:28:28 INFO view.FileSystemViewManager: Creating View Manager with 
> storage type :MEMORY
> 24/05/13 20:28:28 INFO view.FileSystemViewManager: Creating in-memory based 
> Table View
> 24/05/13 20:28:28 INFO util.log: Logging initialized @53678ms to 
> org.apache.hudi.org.apache.jetty.util.log.Slf4jLog
> 24/05/13 20:28:28 INFO javalin.Javalin:
>        __                      __ _            __ __
>       / / _ _   __  _ / /(_)      / // /
>  __  / // __ `/| | / // __ `// // // __ \    / // /_
> / /_/ // /_/ / | |/ // /_/ // // // / / /   /__  __/
> \/ \__,_/  |___/ \__,_//_//_//_/ /_/      /_/
>           https://javalin.io/documentation
> 24/05/13 20:28:28 INFO javalin.Javalin: Starting Javalin ...
> 24/05/13 20:28:28 INFO javalin.Javalin: You are running Javalin 4.6.7 
> (released October 24, 2022. Your Javalin version is 567 days old. Consider 
> checking for a newer version.).
> 24/05/13 20:28:28 INFO server.Server: jetty-9.4.48.v20220622; built: 
> 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 
> 11.0.20.1+1
> 24/05/13 20:28:28 INFO server.Server: Started @54065ms
> 24/05/13 20:28:28 INFO javalin.Javalin: Listening on http://localhost:35071/
> 24/05/13 20:28:28 INFO javalin.Javalin: Javalin started in 177ms \o/
> 24/05/13 20:28:28 INFO service.TimelineService: Starting Timeline server on 
> port :35071
> 24/05/13 20:28:28 INFO embedded.EmbeddedTimelineService: Started embedded 
> timeline server at 
> feeder-affiliate-book-svc-sink-09c3c08f71b47a5d-driver-svc.csp.svc:35071
> 24/05/13 20:28:28 INFO client.BaseHoodieClient: Timeline Server already 
> running. Not restarting the service
> 24/05/13 20:28:28 INFO hudi.HoodieSparkSqlWriterInternal: 
> Config.inlineCompactionEnabled ? true
> 24/05/13 20:28:28 INFO hudi.HoodieSparkSqlWr

[jira] [Updated] (HUDI-7762) Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5

2024-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7762:
-
Labels: pull-request-available  (was: )

> Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In 
> Spark3.5
> -
>
> Key: HUDI-7762
> URL: https://issues.apache.org/jira/browse/HUDI-7762
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ma Jian
>Priority: Major
>  Labels: pull-request-available
>
> In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic 
> within Delta. When executed on a Delta table, this may result in an error. 
> Therefore, the logic to determine whether it is a Hudi operation has been 
> altered to class name checks to prevent errors during Delta Lake executions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7759) Remove Hadoop dependencies in hudi-common module

2024-05-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7759:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Remove Hadoop dependencies in hudi-common module
> 
>
> Key: HUDI-7759
> URL: https://issues.apache.org/jira/browse/HUDI-7759
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7758) MDT Initialization Parses Non-Hudi files

2024-05-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7758:
-
Labels: pull-request-available  (was: )

> MDT Initialization Parses Non-Hudi files
> 
>
> Key: HUDI-7758
> URL: https://issues.apache.org/jira/browse/HUDI-7758
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> Right now the MDT initialization will parse files that do not belong to the 
> Hudi table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7717) hoodie.combine.before.insert silently broken for bulk_insert if meta fields disabled (causes duplicates)

2024-05-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7717:
-
Labels: pull-request-available  (was: )

> hoodie.combine.before.insert silently broken for bulk_insert if meta fields 
> disabled (causes duplicates)
> 
>
> Key: HUDI-7717
> URL: https://issues.apache.org/jira/browse/HUDI-7717
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Aditya Goenka
>Assignee: Geser Dugarov
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> Github issue - [https://github.com/apache/hudi/issues/11044]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7752) Abstract serializeRecords for log writing

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7752:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Abstract serializeRecords for log writing
> -
>
> Key: HUDI-7752
> URL: https://issues.apache.org/jira/browse/HUDI-7752
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7754) Remove AvroWriteSupport and ParquetReaderIterator from hudi-common

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7754:
-
Labels: pull-request-available  (was: )

> Remove AvroWriteSupport and ParquetReaderIterator from hudi-common
> --
>
> Key: HUDI-7754
> URL: https://issues.apache.org/jira/browse/HUDI-7754
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> 2 classes with hadoop deps that can be moved to hadoop common and aren't 
> covered by other prs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7750) Move HoodieLogFormatWriter class to hoodie-hadoop-common module

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7750:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Move HoodieLogFormatWriter class to hoodie-hadoop-common module
> ---
>
> Key: HUDI-7750
> URL: https://issues.apache.org/jira/browse/HUDI-7750
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7589) Add API to create HoodieStorage in HoodieIOFactory

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7589:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Add API to create HoodieStorage in HoodieIOFactory
> --
>
> Key: HUDI-7589
> URL: https://issues.apache.org/jira/browse/HUDI-7589
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> We should use the HoodieIOFactory to create HoodieStorage instance, to 
> replace the hardcoded reflection logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7749) Upgrade Spark patch version to include a fix related to data correctness

2024-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7749:
-
Labels: pull-request-available  (was: )

> Upgrade Spark patch version to include a fix related to data correctness
> 
>
> Key: HUDI-7749
> URL: https://issues.apache.org/jira/browse/HUDI-7749
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/SPARK-44805 shows data correctness 
> issue with Spark 3.3.1 and 3.4.1. We have already upgraded to Spark 3.4.3 in 
> [https://github.com/apache/hudi/commit/cdd146b2c73d50a28bee9f712b689df4fc923222.]
>  We should upgrade to 3.3.4. The issue does not affect 3.2.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer

2024-05-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7748:
-
Labels: pull-request-available  (was: )

> Add logs and drop _hoodie_is_deleted in Transformer
> ---
>
> Key: HUDI-7748
> URL: https://issues.apache.org/jira/browse/HUDI-7748
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7745:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7744:
-
Labels: pull-request-available  (was: )

> Create HoodieIOFactory and config to set it
> ---
>
> Key: HUDI-7744
> URL: https://issues.apache.org/jira/browse/HUDI-7744
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, writer-core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Create HoodieIOFactory that will give the appropriate reader and writer 
> factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7731:
-
Labels: pull-request-available  (was: )

> Fix usage of new Configuration() in production code
> ---
>
> Key: HUDI-7731
> URL: https://issues.apache.org/jira/browse/HUDI-7731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> new Configuration() is used in non-test code in several places:
> HoodieParquetDataBlock.java
> Metrics.java
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7742) Move Hadoop-dependent reader util classes to hudi-hadoop-common module

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7742:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Move Hadoop-dependent reader util classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7742
> URL: https://issues.apache.org/jira/browse/HUDI-7742
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7743) Fix simple mistakes with StoragePath in production code.

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7743:
-
Labels: pull-request-available  (was: )

> Fix simple mistakes with StoragePath in production code.
> 
>
> Key: HUDI-7743
> URL: https://issues.apache.org/jira/browse/HUDI-7743
> Project: Apache Hudi
>  Issue Type: Task
>  Components: code-quality
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Fix many simple mistakes with StoragePath such as doing extra conversions, 
> not using util methods etc.
> Don't fix any mistakes in tests for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7729:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Move ParquetUtils to hudi-hadoop-common
> ---
>
> Key: HUDI-7729
> URL: https://issues.apache.org/jira/browse/HUDI-7729
> Project: Apache Hudi
>  Issue Type: Task
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Move ParquetUtils to hudi-hadoop-common. The methods in hudi-common that are 
> called directly from hudi-common should be abstracted to the base utils class 
> and should throw not implemented from orc utils



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5616) Docs update for specifying org.apache.spark.HoodieSparkKryoRegistrar

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5616:
-
Labels: pull-request-available  (was: )

> Docs update for specifying org.apache.spark.HoodieSparkKryoRegistrar
> 
>
> Key: HUDI-5616
> URL: https://issues.apache.org/jira/browse/HUDI-5616
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: docs
>Reporter: Ethan Guo
>Assignee: Shiyan Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> There is a usability change in [this 
> PR|https://github.com/apache/hudi/pull/7702] that requires a new conf for 
> spark users
> --conf  spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar
> There will be a hit on performance (it was actually always there) if this is 
> not specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7726) Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7726:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils
> --
>
> Key: HUDI-7726
> URL: https://issues.apache.org/jira/browse/HUDI-7726
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7739:
-
Labels: pull-request-available  (was: )

> Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
> --
>
> Key: HUDI-7739
> URL: https://issues.apache.org/jira/browse/HUDI-7739
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7738) FileStreamReader need set Charset with UTF-8

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7738:
-
Labels: pull-request-available  (was: )

> FileStreamReader need set Charset with UTF-8
> 
>
> Key: HUDI-7738
> URL: https://issues.apache.org/jira/browse/HUDI-7738
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> FileStreamReader need set Charset with UTF-8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7737) Bump Spark 3.4 version to Spark 3.4.3

2024-05-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7737:
-
Labels: pull-request-available  (was: )

> Bump Spark 3.4 version to Spark 3.4.3
> -
>
> Key: HUDI-7737
> URL: https://issues.apache.org/jira/browse/HUDI-7737
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Major
>  Labels: pull-request-available
>
> Spark 3.4.3 has been released: https://github.com/apache/spark/tree/v3.4.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7735) Remove usage of SerializableConfiguration

2024-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7735:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Remove usage of SerializableConfiguration
> -
>
> Key: HUDI-7735
> URL: https://issues.apache.org/jira/browse/HUDI-7735
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7734) Remove unused FSPermissionDTO

2024-05-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7734:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Remove unused FSPermissionDTO
> -
>
> Key: HUDI-7734
> URL: https://issues.apache.org/jira/browse/HUDI-7734
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7728) Use StorageConfiguration in LockProvider constructors

2024-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7728:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Use StorageConfiguration in LockProvider constructors
> -
>
> Key: HUDI-7728
> URL: https://issues.apache.org/jira/browse/HUDI-7728
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7727) Avoid constructAbsolutePathInHadoopPath in hudi-common module

2024-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7727:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Avoid constructAbsolutePathInHadoopPath in hudi-common module
> -
>
> Key: HUDI-7727
> URL: https://issues.apache.org/jira/browse/HUDI-7727
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7725) Restructure HFileBootstrapIndex to separate Hadoop-dependent logic

2024-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7725:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Restructure HFileBootstrapIndex to separate Hadoop-dependent logic
> --
>
> Key: HUDI-7725
> URL: https://issues.apache.org/jira/browse/HUDI-7725
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7723) DayBasedCompactionStrategy support io bounded

2024-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7723:
-
Labels: pull-request-available  (was: )

> DayBasedCompactionStrategy support io bounded
> -
>
> Key: HUDI-7723
> URL: https://issues.apache.org/jira/browse/HUDI-7723
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compaction
>Reporter: Askwang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7721) Fix broken build on master

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7721:
-
Labels: pull-request-available  (was: )

> Fix broken build on master
> --
>
> Key: HUDI-7721
> URL: https://issues.apache.org/jira/browse/HUDI-7721
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
>
> TestHoodieDeltaStreamer is invalid due to 
> [https://github.com/apache/hudi/pull/11099.] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7350) Introduce HoodieIOFactory to abstract the reader and writer implementation

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7350:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Introduce HoodieIOFactory to abstract the reader and writer implementation
> --
>
> Key: HUDI-7350
> URL: https://issues.apache.org/jira/browse/HUDI-7350
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Blocker
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7720:
-
Labels: pull-request-available  (was: )

> Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
> -
>
> Key: HUDI-7720
> URL: https://issues.apache.org/jira/browse/HUDI-7720
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1280X1280.PNG
>
>
> Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan 
> executor 204): java.lang.NullPointerException
> at java.util.ArrayList.(ArrayList.java:178)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104)
> at 
> org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295)
> at 
> org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493)
> at 
> org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at 
> scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) 
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
> at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
> at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at 
> scala.collection.AbstractIterator.to(Iterator.scala:1431) at 
> scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at 
> scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
> at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
> at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at 
> scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at 
> org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
> at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) 
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at 
> org.apache.spark.scheduler.Task.run(Task.scala:131) at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7718) Use source profile in HoodieIncrSource

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7718:
-
Labels: pull-request-available  (was: )

> Use source profile in HoodieIncrSource
> --
>
> Key: HUDI-7718
> URL: https://issues.apache.org/jira/browse/HUDI-7718
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Vinish Reddy
>Assignee: Vinish Reddy
>Priority: Minor
>  Labels: pull-request-available
>
> Use source profile in HoodieIncrSource for utilising proper parallelism and 
> numInstantsPerFetch based on data volume.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7654) Implement the pre-CBO rules

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7654:
-
Labels: pull-request-available  (was: )

> Implement the pre-CBO rules
> ---
>
> Key: HUDI-7654
> URL: https://issues.apache.org/jira/browse/HUDI-7654
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7716) Add more logs around index lookup

2024-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7716:
-
Labels: pull-request-available  (was: )

> Add more logs around index lookup
> -
>
> Key: HUDI-7716
> URL: https://issues.apache.org/jira/browse/HUDI-7716
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7715) Partition TTL for Flink

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7715:
-
Labels: pull-request-available  (was: )

> Partition TTL for Flink
> ---
>
> Key: HUDI-7715
> URL: https://issues.apache.org/jira/browse/HUDI-7715
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xi chaomin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7713) Schema Reconciliation should also re-order fields

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7713:
-
Labels: pull-request-available  (was: )

> Schema Reconciliation should also re-order fields
> -
>
> Key: HUDI-7713
> URL: https://issues.apache.org/jira/browse/HUDI-7713
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Timothy Brown
>Assignee: Timothy Brown
>Priority: Major
>  Labels: pull-request-available
>
> The schema reconciliation current makes sure the incoming schema is 
> compatible with the target but it can also be used to guarantee a consistent 
> ordering of fields in the schema between commits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7712) Account for file slices instead of just base files while initializing RLI for MOR table

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7712:
-
Labels: pull-request-available  (was: )

> Account for file slices instead of just base files while initializing RLI for 
> MOR table
> ---
>
> Key: HUDI-7712
> URL: https://issues.apache.org/jira/browse/HUDI-7712
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> we could have deletes in log files. and hence we need to account for entire 
> file slice instead of just base files while initializing RLI for MOR table. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7704) Unify test client storage classes with duplicate code

2024-05-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7704:
-
Labels: pull-request-available  (was: )

> Unify test client storage classes with duplicate code 
> --
>
> Key: HUDI-7704
> URL: https://issues.apache.org/jira/browse/HUDI-7704
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jonathan Vexler
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
>
> TestHoodieClientOnCopyOnWriteStorage
> TestHoodieJavaClientOnCopyOnWriteStorage
> have a bunch of duplicate code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-05-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7711:
-
Labels: pull-request-available  (was: )

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: review_processed_data
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache

[jira] [Updated] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution

2024-05-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7710:
-
Labels: pull-request-available  (was: )

> BugFix: Remove compaction.inflight from conflict resolution
> ---
>
> Key: HUDI-7710
> URL: https://issues.apache.org/jira/browse/HUDI-7710
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: compaction
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
>
> During conflict resolution, compaction.inflight is found; since they don't 
> contain any plan information, this could cause NPE error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7707) Enable bundle validation on Java 8 and 11

2024-05-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7707:
-
Labels: pull-request-available  (was: )

> Enable bundle validation on Java 8 and 11
> -
>
> Key: HUDI-7707
> URL: https://issues.apache.org/jira/browse/HUDI-7707
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: Screenshot 2024-05-02 at 17.41.02.png
>
>
> Bundle validation with Java 8 and 11 are somehow skipped in GH CI.  They 
> should be enabled. !Screenshot 2024-05-02 at 
> 17.41.02.png|width=905,height=325!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7706) Improve validation in PARTITION_STATS index test

2024-05-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7706:
-
Labels: pull-request-available  (was: )

> Improve validation in PARTITION_STATS index test
> 
>
> Key: HUDI-7706
> URL: https://issues.apache.org/jira/browse/HUDI-7706
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> We should add the record key in MDT when validating the partition stats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4   5   6   7   8   9   10   >