[jira] [Updated] (HUDI-7815) Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh timeline
[ https://issues.apache.org/jira/browse/HUDI-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7815: - Labels: pull-request-available (was: ) > Multiple writer with bulkinsert getAllPendingClusteringPlans should refresh > timeline > > > Key: HUDI-7815 > URL: https://issues.apache.org/jira/browse/HUDI-7815 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7814) Exclude unused transitive dependencies that introduce vulnerabilities
[ https://issues.apache.org/jira/browse/HUDI-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7814: - Labels: pull-request-available (was: ) > Exclude unused transitive dependencies that introduce vulnerabilities > - > > Key: HUDI-7814 > URL: https://issues.apache.org/jira/browse/HUDI-7814 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7812) Async Clustering w/ row writer fails due to timetravel query validation
[ https://issues.apache.org/jira/browse/HUDI-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7812: - Labels: pull-request-available (was: ) > Async Clustering w/ row writer fails due to timetravel query validation > > > Key: HUDI-7812 > URL: https://issues.apache.org/jira/browse/HUDI-7812 > Project: Apache Hudi > Issue Type: Bug > Components: clustering >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > With clustering row writer enabled flow, we trigger a time travel query to > read input records. But the query side fails if there are any pending commits > (due to new ingestion ) whose timestamp < clustering instant time. we need to > relax this constraint. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7810) Fix OptionsResolver#allowCommitOnEmptyBatch default value bug
[ https://issues.apache.org/jira/browse/HUDI-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7810: - Labels: pull-request-available (was: ) > Fix OptionsResolver#allowCommitOnEmptyBatch default value bug > - > > Key: HUDI-7810 > URL: https://issues.apache.org/jira/browse/HUDI-7810 > Project: Apache Hudi > Issue Type: Bug >Reporter: bradley >Priority: Major > Labels: pull-request-available > > Fix OptionsResolver#allowCommitOnEmptyBatch default value bug -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7808) Security upgrade io.acryl:datahub-client from 0.8.31 to 0.8.45
[ https://issues.apache.org/jira/browse/HUDI-7808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7808: - Labels: pull-request-available (was: ) > Security upgrade io.acryl:datahub-client from 0.8.31 to 0.8.45 > -- > > Key: HUDI-7808 > URL: https://issues.apache.org/jira/browse/HUDI-7808 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7809) Use Spark SerializableConfiguration to avoid NPE in Kryo serde
[ https://issues.apache.org/jira/browse/HUDI-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7809: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Use Spark SerializableConfiguration to avoid NPE in Kryo serde > -- > > Key: HUDI-7809 > URL: https://issues.apache.org/jira/browse/HUDI-7809 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > > With Hudi 0.14.1, without > "spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar", Hudi > query in Spark quick start guide succeeds. In Hudi 0.15.0-rc2, without the > Kryo registratrar, the Hudi read throws NPE due to HadoopStorageConfiguration. > {code:java} > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2450) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2399) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2398) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2398) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1156) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1156) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1156) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2638) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2580) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2569) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2224) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2245) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2264) > at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:492) > at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:445) > at > org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) > at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715) > at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704) > at org.apache.spark.sql.Dataset.head(Dataset.scala:2728) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2935) > at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:326) > at org.apache.spark.sql.Dataset.show(Dataset.scala:806) > at org.apache.spark.sql.Dataset.show(Dataset.scala:765) > at org.apache.spark.sql.Dataset.show(Dataset.scala:774) > ... 47 elided > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.execution.datasources.parquet.Spark32LegacyHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32LegacyHoodieParquetFileFormat.scala:152) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:187) > at > org.apa
[jira] [Updated] (HUDI-7807) spark-sql updates for a pk less table fails w/ partitioned table
[ https://issues.apache.org/jira/browse/HUDI-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7807: - Labels: pull-request-available (was: ) > spark-sql updates for a pk less table fails w/ partitioned table > - > > Key: HUDI-7807 > URL: https://issues.apache.org/jira/browse/HUDI-7807 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > quick start fails when trying to UPDATE with spark-sql for a pk less table. > > {code:java} > > UPDATE hudi_table4 SET fare = 25.0 WHERE rider = 'rider-D'; > 24/05/28 11:44:41 WARN package: Truncated the string representation of a plan > since it was too large. This behavior can be adjusted by setting > 'spark.sql.debug.maxToStringFields'. > 24/05/28 11:44:41 ERROR SparkSQLDriver: Failed in [UPDATE hudi_table4 SET > fare = 25.0 WHERE rider = 'rider-D'] > org.apache.hudi.exception.HoodieException: Unable to instantiate class > org.apache.hudi.keygen.SimpleKeyGenerator > at > org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:75) > at > org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:123) > at > org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:91) > at > org.apache.hudi.util.SparkKeyGenUtils$.getPartitionColumns(SparkKeyGenUtils.scala:47) > at > org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:218) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:232) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNod
[jira] [Updated] (HUDI-7805) FileSystemBasedLockProvider need delete lock file auto when occur lock conflict to avoid next write failed
[ https://issues.apache.org/jira/browse/HUDI-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7805: - Labels: pull-request-available (was: ) > FileSystemBasedLockProvider need delete lock file auto when occur lock > conflict to avoid next write failed > -- > > Key: HUDI-7805 > URL: https://issues.apache.org/jira/browse/HUDI-7805 > Project: Apache Hudi > Issue Type: Improvement > Components: multi-writer >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > > org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock > object hdfs://aa-region/region04/2211/warehouse/hudi/odsmon_log/.hoodie/lock > at > org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:100) > at > org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:58) > at > org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1258) > at > org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1301) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139) > at > org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:216) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:396) > at > org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108) > at > org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) > at > org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106) > at > org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91) > at org.apache.spark.sql.Dataset.(Dataset.scala:219) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) > at org.apache.spark.sql.SparkSession.withActive(SparkSess
[jira] [Updated] (HUDI-7804) Improve flink bucket index partitioner
[ https://issues.apache.org/jira/browse/HUDI-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7804: - Labels: pull-request-available (was: ) > Improve flink bucket index partitioner > -- > > Key: HUDI-7804 > URL: https://issues.apache.org/jira/browse/HUDI-7804 > Project: Apache Hudi > Issue Type: Bug >Reporter: xi chaomin >Priority: Major > Labels: pull-request-available > > https://github.com/apache/hudi/issues/11288 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7507) ongoing concurrent writers with smaller timestamp can cause issues with table services
[ https://issues.apache.org/jira/browse/HUDI-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7507: - Labels: pull-request-available (was: ) > ongoing concurrent writers with smaller timestamp can cause issues with > table services > --- > > Key: HUDI-7507 > URL: https://issues.apache.org/jira/browse/HUDI-7507 > Project: Apache Hudi > Issue Type: Improvement > Components: table-service >Reporter: Krishen Bhan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Attachments: Flowchart (1).png, Flowchart.png > > > *Scenarios:* > Although HUDI operations hold a table lock when creating a .requested > instant, because HUDI writers do not generate a timestamp and create a > .requsted plan in the same transaction, there can be a scenario where > # Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp > (x - 1) > # Job 1 schedules and creates requested file with instant timestamp (x) > # Job 2 schedules and creates requested file with instant timestamp (x-1) > # Both jobs continue running > If one job is writing a commit and the other is a table service, this can > cause issues: > * > ** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then > when Job 1 runs before Job 2 and can create a compaction plan for all instant > times (up to (x) ) that doesn’t include instant time (x-1) . Later Job 2 > will create instant time (x-1), but timeline will be in a corrupted state > since compaction plan was supposed to include (x-1) > ** There is a similar issue with clean. If Job2 is a long-running commit > (that was stuck/delayed for a while before creating its .requested plan) and > Job 1 is a clean, then Job 1 can perform a clean that updates the > earliest-commit-to-retain without waiting for the inflight instant by Job 2 > at (x-1) to complete. This causes Job2 to be "skipped" by clean. > ** If the completed commit files include som sort of "checkpointing" with > another "downstream job" performing incremental reads on this dataset (such > as Hoodie Streamer/DeltaSync) then there may be incorrect behavior, such as > the incremental reader skipping some completed commits (that have a smaller > instant timestamp than latest completed commit but were created after). > [Edit] I added a diagram to visualize the issue, specifically the second > scenario with clean > !Flowchart (1).png! > *Proposed approach:* > One way this can be resolved is by combining the operations of generating > instant time and creating a requested file in the same HUDI table > transaction. Specifically, executing the following steps whenever any instant > (commit, table service, etc) is scheduled > Approach A > # Acquire table lock > # Look at the latest instant C on the active timeline (completed or not). > Generate a timestamp after C > # Create the plan and requested file using this new timestamp ( that is > greater than C) > # Release table lock > Unfortunately (A) has the following drawbacks > * Every operation must now hold the table lock when computing its plan even > if it's an expensive operation and will take a while > * Users of HUDI cannot easily set their own instant time of an operation, > and this restriction would break any public APIs that allow this and would > require deprecating those APIs. > > An alternate approach is to have every operation abort creating a .requested > file unless it has the latest timestamp. Specifically, for any instant type, > whenever an operation is about to create a .requested plan on timeline, it > should take the table lock and assert that there are no other instants on > timeline that are greater than it that could cause a conflict. If that > assertion fails, then throw a retry-able conflict resolution exception. > Specifically, the following steps should be followed whenever any instant > (commit, table service, etc) is scheduled > Approach B > # Acquire table lock. Assume that the desired instant time C and requested > file plan metadata have already been created, regardless of wether it was > before this step or right after acquiring the table lock. > # If there are any instants on the timeline that are greater than C > (regardless of their operation type or sate status) then release table lock > and throw an exception > # Create requested plan on timeline (As usual) > # Release table lock > Unlike (A), thi
[jira] [Updated] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete
[ https://issues.apache.org/jira/browse/HUDI-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7655: - Labels: clean pull-request-available (was: clean) > Support configuration for clean to fail execution if there is at least one > file is marked as a failed delete > > > Key: HUDI-7655 > URL: https://issues.apache.org/jira/browse/HUDI-7655 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Krishen Bhan >Assignee: sivabalan narayanan >Priority: Minor > Labels: clean, pull-request-available > > When a HUDI clean plan is executed, any targeted file that was not confirmed > as deleted (or non-existing) will be marked as a "failed delete". Although > these failed deletes will be added to `.clean` metadata, if incremental clean > is used then these files might not ever be picked up again as a future clean > plan, unless a "full-scan" clean ends up being scheduled. In addition to > leading to more files unnecessarily taking up storage space for longer, then > can lead to the following dataset consistency issue for COW datasets: > # Insert at C1 creates file group f1 in partition > # Replacecommit at RC2 creates file group f2 in partition, and replaces f1 > # Any reader of partition that calls HUDI API (with or without using MDT) > will recognize that f1 should be ignored, as it has been replaced. This is > since RC2 instant file is in active timeline > # Some completed instants later an incremental clean is scheduled. It moves > the "earliest commit to retain" to an time after instant time RC2, so it > targets f1 for deletion. But during execution of the plan, it fails to delete > f1. > # An archive job eventually is triggered, and archives C1 and RC2. Note that > f1 is still in partition > At this point, any job/query that reads the aforementioned partition directly > from the DFS file system calls (without directly using MDT FILES partition) > will consider both f1 and f2 as valid file groups, since RC2 is no longer in > active timeline. This is a data consistency issue, and will only be resolved > if a "full-scan" clean is triggered and deletes f1. > This specific scenario can be avoided if the user can configure HUDI clean to > fail execution of a clean plan unless all files are confirmed as deleted (or > not existing in DFS already), "blocking" the clean. The next clean attempt > will re-execute this existing plan, since clean plans cannot be "rolled > back". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7802) Fix bundle validation scripts
[ https://issues.apache.org/jira/browse/HUDI-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7802: - Labels: pull-request-available (was: ) > Fix bundle validation scripts > - > > Key: HUDI-7802 > URL: https://issues.apache.org/jira/browse/HUDI-7802 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Issues: > * Bundle validation with packaging/bundle-validation/ci_run.sh fails for > release-0.15.0 branch due to script issue > * scripts/release/validate_staged_bundles.sh needs to include additional > bundles. > * Add release candidate validation on scala 2.13 bundles. > * Disable release candidate validation by default. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7801) Directly pass down HoodieStorage instance instead of recreation
[ https://issues.apache.org/jira/browse/HUDI-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7801: - Labels: pull-request-available (was: ) > Directly pass down HoodieStorage instance instead of recreation > --- > > Key: HUDI-7801 > URL: https://issues.apache.org/jira/browse/HUDI-7801 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > There are places that use HoodieStorage#newInstance to recreate HoodieStorage > instance which may not be necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7799) Optimize the access modifier of AbstractHoodieLogRecordReader#processNextRecord
[ https://issues.apache.org/jira/browse/HUDI-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7799: - Labels: pull-request-available (was: ) > Optimize the access modifier of > AbstractHoodieLogRecordReader#processNextRecord > --- > > Key: HUDI-7799 > URL: https://issues.apache.org/jira/browse/HUDI-7799 > Project: Apache Hudi > Issue Type: Improvement >Reporter: bradley >Priority: Major > Labels: pull-request-available > > Correct the access modifier of the processNextRecord member method of the > Scanner class -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7798) Mark configs included in 0.15.0 release
[ https://issues.apache.org/jira/browse/HUDI-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7798: - Labels: pull-request-available (was: ) > Mark configs included in 0.15.0 release > --- > > Key: HUDI-7798 > URL: https://issues.apache.org/jira/browse/HUDI-7798 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > We need to mark the configs that go out in 0.15.0 release with > `.sinceVersion("0.15.0")`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7797) Use HoodieIOFactory to return pluggable FileFormatUtils implementation
[ https://issues.apache.org/jira/browse/HUDI-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7797: - Labels: pull-request-available (was: ) > Use HoodieIOFactory to return pluggable FileFormatUtils implementation > -- > > Key: HUDI-7797 > URL: https://issues.apache.org/jira/browse/HUDI-7797 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch dependabot/maven/org.apache.hive-hive-service-2.3.3 deleted (was e7e4b9e3ddf)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/org.apache.hive-hive-service-2.3.3 in repository https://gitbox.apache.org/repos/asf/hudi.git was e7e4b9e3ddf Update pom.xml The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
[jira] [Updated] (HUDI-7796) Gracefully cast file system instance in Avro writers
[ https://issues.apache.org/jira/browse/HUDI-7796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7796: - Labels: pull-request-available (was: ) > Gracefully cast file system instance in Avro writers > > > Key: HUDI-7796 > URL: https://issues.apache.org/jira/browse/HUDI-7796 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > > When running tests in Trino with Hudi MDT enabled, the following line in > HoodieAvroHFileWriter throws class cast exception > {code:java} > this.fs = (HoodieWrapperFileSystem) this.file.getFileSystem(conf); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7794) Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3
[ https://issues.apache.org/jira/browse/HUDI-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7794: - Labels: pull-request-available (was: ) > Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3 > - > > Key: HUDI-7794 > URL: https://issues.apache.org/jira/browse/HUDI-7794 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7795) Fix loading of input splits from look up table reader
[ https://issues.apache.org/jira/browse/HUDI-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7795: - Labels: pull-request-available (was: ) > Fix loading of input splits from look up table reader > - > > Key: HUDI-7795 > URL: https://issues.apache.org/jira/browse/HUDI-7795 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 deleted (was 1560e9aa30c)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 in repository https://gitbox.apache.org/repos/asf/hudi.git was 1560e9aa30c Bump h2 in /hudi-platform-service/hudi-metaserver The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 deleted (was cb331f02ad6)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 in repository https://gitbox.apache.org/repos/asf/hudi.git was cb331f02ad6 Bump h2 in /packaging/hudi-metaserver-server-bundle The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
[jira] [Updated] (HUDI-7792) Bump h2 from 1.4.200 to 2.2.220 in /hudi-platform-service/hudi-metaserver
[ https://issues.apache.org/jira/browse/HUDI-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7792: - Labels: pull-request-available (was: ) > Bump h2 from 1.4.200 to 2.2.220 in /hudi-platform-service/hudi-metaserver > - > > Key: HUDI-7792 > URL: https://issues.apache.org/jira/browse/HUDI-7792 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7791) Bump h2 from 1.4.200 to 2.2.220 in /packaging/hudi-metaserver-server-bundle
[ https://issues.apache.org/jira/browse/HUDI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7791: - Labels: pull-request-available (was: ) > Bump h2 from 1.4.200 to 2.2.220 in /packaging/hudi-metaserver-server-bundle > --- > > Key: HUDI-7791 > URL: https://issues.apache.org/jira/browse/HUDI-7791 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was ac3ae4a66fa)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git was ac3ae4a66fa Merge branch 'master' into dependabot/maven/hive.version-3.1.2 The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2 No new revisions were added by this update.
(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was 14e3d559bd0)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git was 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2 The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2 No new revisions were added by this update.
(hudi) branch dependabot/maven/org.apache.hive-hive-service-2.3.3 created (now da65366d649)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/org.apache.hive-hive-service-2.3.3 in repository https://gitbox.apache.org/repos/asf/hudi.git at da65366d649 Bump org.apache.hive:hive-service from 2.3.1 to 2.3.3 No new revisions were added by this update.
(hudi) branch dependabot/maven/hive.version-3.1.2 deleted (was 14e3d559bd0)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git was 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2 The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch dependabot/maven/hive.version-3.1.2 created (now 14e3d559bd0)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hive.version-3.1.2 in repository https://gitbox.apache.org/repos/asf/hudi.git at 14e3d559bd0 Bump hive.version from 2.3.1 to 3.1.2 No new revisions were added by this update.
(hudi) branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 updated (e96f60a4406 -> 1560e9aa30c)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/hudi-platform-service/hudi-metaserver/com.h2database-h2-2.2.220 in repository https://gitbox.apache.org/repos/asf/hudi.git discard e96f60a4406 Bump h2 in /hudi-platform-service/hudi-metaserver add ddaef8feddb [HUDI-5101] Adding spark-structured streaming test support via spark-submit job (#7074) add e2dfb465f13 [HUDI-7495] Bump mysql-connector-java from 8.0.22 to 8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7674) add e6664159bed [HUDI-7163] Fix not parsable text DateTimeParseException when compact (#10220) add 3698d49383b [HUDI-7496] Bump mybatis from 3.4.6 to 3.5.6 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7673) add 819788f8651 [MINOR] Remove repetitive words in docs (#10844) add ee11b9c951c [HUDI-7489] Avoid collecting WriteStatus to driver in row writer code path (#10836) add 130498708bb add job context (#10848) add 8bc9a4bc875 [HUDI-7478] Fix max delta commits guard check w/ MDT (#10820) add f8f12ba9ef3 [MINOR] Fix and enable test TestHoodieDeltaStreamer.testJdbcSourceIncrementalFetchInContinuousMode (#10867) add aac6b2e5486 [HUDI-7382] Get partitions from active timeline instead of listing when building clustering plan (#10621) add ca2140e2003 [MINOR] rename KeyGenUtils#enableAutoGenerateRecordKeys (#10871) add 3c8488b831c [HUDI-7506] Compute offsetRanges based on eventsPerPartition allocated in each range (#10869) add e726306cf09 [HUDI-7466] Add parallel listing of existing partitions in Glue Catalog sync (#10460) add 2dcdd311245 [HUDI-7421] Build HoodieDeltaWriteStat using HoodieDeltaWriteStat#copy (#10870) add b7ccecf3205 [HUDI-7492] Fix the incorrect keygenerator specification for multi partition or multi primary key tables creation (#10840) add 7631e0dcb89 [MINOR] Add Hudi icon for idea (#10880) add 784af0e1786 [HUDI-7514] Update Manifest file after the parquet writer closed in LSMTimelineWriter (#10883) add 7c55ac35ba1 [HUDI-7516] Put jdbc-h2 creds into static variables for hudi-utilities tests (#10889) add 135db099afc [MINOR] Remove redundant fileId from HoodieAppendHandle (#10901) add 5a21a1dd260 [HUDI-7529] Resolve hotspots in stream read (#10911) add 47151f653d8 [HUDI-7487] Fixed test with in-memory index by proper heap clearing (#10910) add 6be7205a1e3 [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` (#10912) add a8e9db446c3 [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables (#10908) add f98a40bd369 [HUDI-7499] Support FirstValueAvroPayload for Hudi (#10857) add da9660bf38a checkstyle (#10919) add d749457f9d5 [HUDI-7513] Add jackson-module-scala to spark bundle (#10877) add d22bfba08fd [MINOR] Restore the setMaxParallelism setting for HoodieTableSource.produceDataStream (#10925) add 5e4a6c650e0 [HUDI-7531] Consider pending clustering when scheduling a new clustering plan (#10923) add 8a137631da8 [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes (#10913) add 136d0755ad7 [HUDI-7500] fix gaps with deduce schema and null schema (#10858) add 28f67ff3561 [HUDI-7551] Avoid loading all partitions in CleanPlanner when MDT is enabled (#10928) add 4741ba06462 [HUDI-6317] Streaming read should skip compaction and clustering instants to avoid duplicates (#8884) add ae4f46874a9 [MINOR} When M3 metrics reporter type is used HoodieMetricsConfig should create default values for HoodieMetricsM3Config (#10936) add 06d3bb8cfbd [HUDI-6884] hudi-cli should generate correct HoodieTimeGeneratorConfig (#10941) add 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer properties (#10866) add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT (#10874) add 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938) add 9efced37f81 [HUDI-7557] Fix incremental cleaner when commit for savepoint removed (#10946) add bb51aca75d0 [MINOR] Upgrade mockito to 3.12.4 (#10953) add 8bb6bee6234 [HUDI-7564] Fix HiveSyncConfig inconsistency (#10951) add 59e32b7e686 [HUDI-7569] [RLI] Fix wrong result generated by query (#10955) add bf723f56cd0 [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation (#10778) add 398c9a23c84 [HUDI-7564] Revert hive sync inconsistency and reason for it (#10959) add 8b61696f158 [HUDI-7556] Fixing MDT validator and adding tests (#10939) add 19c20e4dd93 [HUDI-7571] Add api to get exception details in HoodieMetadataTableValidator with ignoreFailed mode (#10960) add bac6ea7b26b [MINOR] Removed FSUtils.makeBaseFileName without fileExt param (#10963) add d41541cb9f8
(hudi) branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 updated (b10b98e5d8f -> cb331f02ad6)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/packaging/hudi-metaserver-server-bundle/com.h2database-h2-2.2.220 in repository https://gitbox.apache.org/repos/asf/hudi.git discard b10b98e5d8f Bump h2 in /packaging/hudi-metaserver-server-bundle add ddaef8feddb [HUDI-5101] Adding spark-structured streaming test support via spark-submit job (#7074) add e2dfb465f13 [HUDI-7495] Bump mysql-connector-java from 8.0.22 to 8.0.28 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7674) add e6664159bed [HUDI-7163] Fix not parsable text DateTimeParseException when compact (#10220) add 3698d49383b [HUDI-7496] Bump mybatis from 3.4.6 to 3.5.6 in /hudi-platform-service/hudi-metaserver/hudi-metaserver-server (#7673) add 819788f8651 [MINOR] Remove repetitive words in docs (#10844) add ee11b9c951c [HUDI-7489] Avoid collecting WriteStatus to driver in row writer code path (#10836) add 130498708bb add job context (#10848) add 8bc9a4bc875 [HUDI-7478] Fix max delta commits guard check w/ MDT (#10820) add f8f12ba9ef3 [MINOR] Fix and enable test TestHoodieDeltaStreamer.testJdbcSourceIncrementalFetchInContinuousMode (#10867) add aac6b2e5486 [HUDI-7382] Get partitions from active timeline instead of listing when building clustering plan (#10621) add ca2140e2003 [MINOR] rename KeyGenUtils#enableAutoGenerateRecordKeys (#10871) add 3c8488b831c [HUDI-7506] Compute offsetRanges based on eventsPerPartition allocated in each range (#10869) add e726306cf09 [HUDI-7466] Add parallel listing of existing partitions in Glue Catalog sync (#10460) add 2dcdd311245 [HUDI-7421] Build HoodieDeltaWriteStat using HoodieDeltaWriteStat#copy (#10870) add b7ccecf3205 [HUDI-7492] Fix the incorrect keygenerator specification for multi partition or multi primary key tables creation (#10840) add 7631e0dcb89 [MINOR] Add Hudi icon for idea (#10880) add 784af0e1786 [HUDI-7514] Update Manifest file after the parquet writer closed in LSMTimelineWriter (#10883) add 7c55ac35ba1 [HUDI-7516] Put jdbc-h2 creds into static variables for hudi-utilities tests (#10889) add 135db099afc [MINOR] Remove redundant fileId from HoodieAppendHandle (#10901) add 5a21a1dd260 [HUDI-7529] Resolve hotspots in stream read (#10911) add 47151f653d8 [HUDI-7487] Fixed test with in-memory index by proper heap clearing (#10910) add 6be7205a1e3 [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` (#10912) add a8e9db446c3 [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables (#10908) add f98a40bd369 [HUDI-7499] Support FirstValueAvroPayload for Hudi (#10857) add da9660bf38a checkstyle (#10919) add d749457f9d5 [HUDI-7513] Add jackson-module-scala to spark bundle (#10877) add d22bfba08fd [MINOR] Restore the setMaxParallelism setting for HoodieTableSource.produceDataStream (#10925) add 5e4a6c650e0 [HUDI-7531] Consider pending clustering when scheduling a new clustering plan (#10923) add 8a137631da8 [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes (#10913) add 136d0755ad7 [HUDI-7500] fix gaps with deduce schema and null schema (#10858) add 28f67ff3561 [HUDI-7551] Avoid loading all partitions in CleanPlanner when MDT is enabled (#10928) add 4741ba06462 [HUDI-6317] Streaming read should skip compaction and clustering instants to avoid duplicates (#8884) add ae4f46874a9 [MINOR} When M3 metrics reporter type is used HoodieMetricsConfig should create default values for HoodieMetricsM3Config (#10936) add 06d3bb8cfbd [HUDI-6884] hudi-cli should generate correct HoodieTimeGeneratorConfig (#10941) add 26c00a3adef [HUDI-7187] Fix integ test props to honor new streamer properties (#10866) add 9b094e628d6 [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT (#10874) add 44ab6f32bff [HUDI-6538] Refactor methods in TimelineDiffHelper class (#10938) add 9efced37f81 [HUDI-7557] Fix incremental cleaner when commit for savepoint removed (#10946) add bb51aca75d0 [MINOR] Upgrade mockito to 3.12.4 (#10953) add 8bb6bee6234 [HUDI-7564] Fix HiveSyncConfig inconsistency (#10951) add 59e32b7e686 [HUDI-7569] [RLI] Fix wrong result generated by query (#10955) add bf723f56cd0 [HUDI-7486] Classify schema exceptions when converting from avro to spark row representation (#10778) add 398c9a23c84 [HUDI-7564] Revert hive sync inconsistency and reason for it (#10959) add 8b61696f158 [HUDI-7556] Fixing MDT validator and adding tests (#10939) add 19c20e4dd93 [HUDI-7571] Add api to get exception details in HoodieMetadataTableValidator with ignoreFailed mode (#10960) add bac6ea7b26b [MINOR] Removed FSUtils.makeBaseFileName without fileExt param (#10963) add d41541cb9f8
[jira] [Updated] (HUDI-7790) Revert changes in DFSPathSelector and UtilHelpers.readConfig
[ https://issues.apache.org/jira/browse/HUDI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7790: - Labels: pull-request-available (was: ) > Revert changes in DFSPathSelector and UtilHelpers.readConfig > > > Key: HUDI-7790 > URL: https://issues.apache.org/jira/browse/HUDI-7790 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > This is to avoid behavior changes in DFSPathSelector and keep the > UtilHelpers.readConfig API the same as before. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7788) Fixing exception handling in AverageRecordSizeUtils
[ https://issues.apache.org/jira/browse/HUDI-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7788: - Labels: pull-request-available (was: ) > Fixing exception handling in AverageRecordSizeUtils > --- > > Key: HUDI-7788 > URL: https://issues.apache.org/jira/browse/HUDI-7788 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > We should catch Throwable to avoid any issue during record size estimation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7787) Reload the data for lookup table when found the newer commit instance
[ https://issues.apache.org/jira/browse/HUDI-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7787: - Labels: pull-request-available (was: ) > Reload the data for lookup table when found the newer commit instance > - > > Key: HUDI-7787 > URL: https://issues.apache.org/jira/browse/HUDI-7787 > Project: Apache Hudi > Issue Type: Improvement >Reporter: hehuiyuan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7786) Fix roaring bitmap dependency in hudi-integ-test-bundle
[ https://issues.apache.org/jira/browse/HUDI-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7786: - Labels: pull-request-available (was: ) > Fix roaring bitmap dependency in hudi-integ-test-bundle > --- > > Key: HUDI-7786 > URL: https://issues.apache.org/jira/browse/HUDI-7786 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7785) Keep public APIs in utilities module the same as before HoodieStorage abstraction
[ https://issues.apache.org/jira/browse/HUDI-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7785: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Keep public APIs in utilities module the same as before HoodieStorage > abstraction > - > > Key: HUDI-7785 > URL: https://issues.apache.org/jira/browse/HUDI-7785 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > > BaseErrorTableWriter, HoodieStreamer, StreamSync, etc., are public API > classes and contain public API methods, which should be kept the same as > before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4491) Re-enable TestHoodieFlinkQuickstart
[ https://issues.apache.org/jira/browse/HUDI-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4491: - Labels: pull-request-available (was: ) > Re-enable TestHoodieFlinkQuickstart > > > Key: HUDI-4491 > URL: https://issues.apache.org/jira/browse/HUDI-4491 > Project: Apache Hudi > Issue Type: Bug >Reporter: Shawn Chang >Priority: Major > Labels: pull-request-available > > This test was disabled before due to its flakiness. We need to re-enable it > again -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7784) Fix serde of HoodieHadoopConfiguration in Spark
[ https://issues.apache.org/jira/browse/HUDI-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7784: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Fix serde of HoodieHadoopConfiguration in Spark > --- > > Key: HUDI-7784 > URL: https://issues.apache.org/jira/browse/HUDI-7784 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7783) Fix connection leak in FileSystemBasedLockProvider
[ https://issues.apache.org/jira/browse/HUDI-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7783: - Labels: pull-request-available (was: ) > Fix connection leak in FileSystemBasedLockProvider > -- > > Key: HUDI-7783 > URL: https://issues.apache.org/jira/browse/HUDI-7783 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > > Fix connection leak in FileSystemBasedLockProvider -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7777) Allow HoodieTableMetaClient to take HoodieStorage instance directly
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Allow HoodieTableMetaClient to take HoodieStorage instance directly > > > Key: HUDI- > URL: https://issues.apache.org/jira/browse/HUDI- > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > > We need to functionality for the meta client to -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7774) MercifulJsonConvertor should support Avro logical type
[ https://issues.apache.org/jira/browse/HUDI-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7774: - Labels: pull-request-available (was: ) > MercifulJsonConvertor should support Avro logical type > -- > > Key: HUDI-7774 > URL: https://issues.apache.org/jira/browse/HUDI-7774 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Davis Zhang >Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Remaining Estimate: 168h > > MercifulJsonConverter should be able to convert raw json string entries to > Avro GenericRecord whose format is compliant with the required avro schema. > > The list of conversion we should support with input: > * UUID: String > * Decimal: Number, Number with String representation > * Date: Either Number / String Number or human readable timestamp in > DateTimeFormatter.ISO_LOCAL_DATE format > * Time (milli/micro sec): Number / String Number or human readable timestamp > in > DateTimeFormatter.ISO_LOCAL_TIME format > * Timestamp (milli/micro second): Number / String Number or human readable > timestamp in DateTimeFormatter.ISO_INSTANT format > * Local Timestamp (milli/micro second): Number / String Number or human > readable timestamp in DateTimeFormatter.ISO_LOCAL_DATE_TIME format -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7781) Filter wrong partitions when using hoodie.datasource.write.partitions.to.delete
[ https://issues.apache.org/jira/browse/HUDI-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7781: - Labels: pull-request-available (was: ) > Filter wrong partitions when using > hoodie.datasource.write.partitions.to.delete > --- > > Key: HUDI-7781 > URL: https://issues.apache.org/jira/browse/HUDI-7781 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7776) Simplify HoodieStorage instance fetching
[ https://issues.apache.org/jira/browse/HUDI-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7776: - Labels: pull-request-available (was: ) > Simplify HoodieStorage instance fetching > > > Key: HUDI-7776 > URL: https://issues.apache.org/jira/browse/HUDI-7776 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7778) Duplicate Key exception with RLI
[ https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7778: - Labels: pull-request-available (was: ) > Duplicate Key exception with RLI > - > > Key: HUDI-7778 > URL: https://issues.apache.org/jira/browse/HUDI-7778 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > We are occasionally hitting an exception as below meaning, two records are > ingested to RLI for the same record key from data table. This is not expected > to happen. > > {code:java} > Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while > appending records to > file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476 > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475) > at > org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) > at > org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355) > ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: > Writing multiple records with same key 1 not supported for > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121) > at > org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166) > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467) > ... 31 more > Driver stacktrace:51301 [main] INFO org.apache.spark.scheduler.DAGScheduler > [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 > [main] INFO org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline > service !!51303 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline > server51303 [main] INFO org.apache.hudi.timeline.service.TimelineService [] > - Closing Timeline Service51321 [main] INFO > org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline > Service51321 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline > server > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 197001012 > at > org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80) >at > org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) > at > org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) >at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) >at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) >
[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage
[ https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7775: - Labels: pull-request-available (was: ) > Remove unused APIs in HoodieStorage > --- > > Key: HUDI-7775 > URL: https://issues.apache.org/jira/browse/HUDI-7775 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7761) Make the manifest Writer Extendable
[ https://issues.apache.org/jira/browse/HUDI-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7761: - Labels: pull-request-available (was: ) > Make the manifest Writer Extendable > --- > > Key: HUDI-7761 > URL: https://issues.apache.org/jira/browse/HUDI-7761 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sivaguru Kannan >Priority: Major > Labels: pull-request-available > > * Make the manifest writer extendable such that clients can plugin in the > custom instance of manifest writer for their syncs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit
[ https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5505: - Labels: pull-request-available (was: ) > Compaction NUM_COMMITS policy should only judge completed deltacommit > - > > Key: HUDI-5505 > URL: https://issues.apache.org/jira/browse/HUDI-5505 > Project: Apache Hudi > Issue Type: Bug > Components: compaction, table-service >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > Attachments: image-2023-01-05-13-10-57-918.png > > > `compaction.delta_commits =1` > > {code:java} > 20230105115229301.deltacommit > 20230105115229301.deltacommit.inflight > 20230105115229301.deltacommit.requested > 20230105115253118.commit > 20230105115253118.compaction.inflight > 20230105115253118.compaction.requested > 20230105115330994.deltacommit.inflight > 20230105115330994.deltacommit.requested{code} > The return result of `ScheduleCompactionActionExecutor.needCompact ` is > `true`, > This should not be expected. > > And In the `Occ` or `lazy clean` mode,this will cause compaction trigger > early. > `compaction.delta_commits =3` > > {code:java} > 20230105125650541.deltacommit.inflight > 20230105125650541.deltacommit.requested > 20230105125715081.deltacommit > 20230105125715081.deltacommit.inflight > 20230105125715081.deltacommit.requested > 20230105130018070.deltacommit.inflight > 20230105130018070.deltacommit.requested {code} > > And compaction will be trigger, this should not be expected. > !image-2023-01-05-13-10-57-918.png|width=699,height=158! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7770) Bootstrap read tries to parse partition from the bootstrap base path
[ https://issues.apache.org/jira/browse/HUDI-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7770: - Labels: pull-request-available (was: ) > Bootstrap read tries to parse partition from the bootstrap base path > > > Key: HUDI-7770 > URL: https://issues.apache.org/jira/browse/HUDI-7770 > Project: Apache Hudi > Issue Type: Bug > Components: bootstrap, spark >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > > Bootstrap gets the partition path values from the bootstrap base path when > reading the base file but from the hudi table in all other cases. Just use > the hudi path in all cases to keep partition parsing more simple -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7772) HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting archiving of instants
[ https://issues.apache.org/jira/browse/HUDI-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7772: - Labels: pull-request-available (was: ) > HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting > archiving of instants > --- > > Key: HUDI-7772 > URL: https://issues.apache.org/jira/browse/HUDI-7772 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > > When user alter table by adding a column then insert new data to the table > with set mdt enable, would error out with follow, from the stack we find that > FileSystemBackedTableMetadata not support it. > org.apache.hudi.exception.HoodieException: Error limiting instant archival > based on metadata table > at > org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:522) > at > org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:167) > at > org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:791) > at > org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:890) > at > org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:619) > at > org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:585) > at > org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:248) > at > org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:104) > at > org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1020) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:405) > at > org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:108) > at > org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.run(InsertIntoHoodieTableCommand.scala:61) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transfo
[jira] [Updated] (HUDI-7769) Fix Hudi CDC read on Spark 3.3.4 and 3.4.3
[ https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7769: - Labels: pull-request-available (was: ) > Fix Hudi CDC read on Spark 3.3.4 and 3.4.3 > -- > > Key: HUDI-7769 > URL: https://issues.apache.org/jira/browse/HUDI-7769 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7771) Make default hoodie record payload as OverwriteWithLatestPayload for 0.15.0
[ https://issues.apache.org/jira/browse/HUDI-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7771: - Labels: pull-request-available (was: ) > Make default hoodie record payload as OverwriteWithLatestPayload for 0.15.0 > --- > > Key: HUDI-7771 > URL: https://issues.apache.org/jira/browse/HUDI-7771 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > > We made "DefaultHoodieRecordPayload" as default for 1.x. but lets keep it as > OverwriteWithLatestAvroPayload for 0.15.10 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7767) Revert Spark 3.3 and 3.4 upgrades
[ https://issues.apache.org/jira/browse/HUDI-7767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7767: - Labels: pull-request-available (was: ) > Revert Spark 3.3 and 3.4 upgrades > -- > > Key: HUDI-7767 > URL: https://issues.apache.org/jira/browse/HUDI-7767 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7766) Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile
[ https://issues.apache.org/jira/browse/HUDI-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7766: - Labels: pull-request-available (was: ) > Adding staging jar deployment command for Spark 3.5 and Scala 2.13 profile > -- > > Key: HUDI-7766 > URL: https://issues.apache.org/jira/browse/HUDI-7766 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7765) Turn off native HFile reader for 0.15.0 release
[ https://issues.apache.org/jira/browse/HUDI-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7765: - Labels: pull-request-available (was: ) > Turn off native HFile reader for 0.15.0 release > --- > > Key: HUDI-7765 > URL: https://issues.apache.org/jira/browse/HUDI-7765 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7768) Fix failing tests for 0.15.0 release (async compaction and metadata num commits check)
[ https://issues.apache.org/jira/browse/HUDI-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7768: - Labels: pull-request-available (was: ) > Fix failing tests for 0.15.0 release (async compaction and metadata num > commits check) > -- > > Key: HUDI-7768 > URL: https://issues.apache.org/jira/browse/HUDI-7768 > Project: Apache Hudi > Issue Type: Improvement > Components: tests-ci >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > > > |[https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=23953=logs=600e7de6-e133-5e69-e615-50ee129b3c08=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7] > [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=23953=logs=7601efb9-4019-552e-11ba-eb31b66593b2=d4b4e11d-8e26-50e5-a0d9-bb2d5decfeb9] > org.apache.hudi.exception.HoodieMetadataException: Metadata table's > deltacommits exceeded 3: this is likely caused by a pending instant in the > data table. Resolve the pending instant or adjust > `hoodie.metadata.max.deltacommits.when_pending`, then restart the pipeline. > at > org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.checkNumDeltaCommits([HoodieBackedTableMetadataWriter.java:835|http://hoodiebackedtablemetadatawriter.java:835/]) > at > org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.validateTimelineBeforeSchedulingCompaction([HoodieBackedTableMetadataWriter.java:1367|http://hoodiebackedtablemetadatawriter.java:1367/]) > java.lang.IllegalArgumentException: Following instants have timestamps >= > compactionInstant (002) Instants > :[[004__deltacommit__COMPLETED__20240515123806398]] at > org.apache.hudi.common.util.ValidationUtils.checkArgument([ValidationUtils.java:42|http://validationutils.java:42/]) > at > org.apache.hudi.table.action.compact.ScheduleCompactionActionExecutor.execute([ScheduleCompactionActionExecutor.java:108|http://schedulecompactionactionexecutor.java:108/]) > | -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7764) DefaultHoodieRecordPayload should be projection compatible
[ https://issues.apache.org/jira/browse/HUDI-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7764: - Labels: pull-request-available (was: ) > DefaultHoodieRecordPayload should be projection compatible > -- > > Key: HUDI-7764 > URL: https://issues.apache.org/jira/browse/HUDI-7764 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Blocker > Labels: pull-request-available > Fix For: 0.15.0 > > > DefaultHoodieRecordPayload is not listed as projection compatible. Therefore, > with relation reader we end up reading all the columns for mor reads. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7763) Fix that jmx reporter cannot initialized if metadata enables
[ https://issues.apache.org/jira/browse/HUDI-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7763: - Labels: metrics pull-request-available (was: metrics) > Fix that jmx reporter cannot initialized if metadata enables > > > Key: HUDI-7763 > URL: https://issues.apache.org/jira/browse/HUDI-7763 > Project: Apache Hudi > Issue Type: Bug > Environment: hudi0.14.1, Spark3.2 >Reporter: Jihwan Lee >Priority: Major > Labels: metrics, pull-request-available > > If the jmx metric option is activated, port settings can be set to range. > > Because metadata is also written as hoodie table, requires multiple metric > instances. (If not, occur exception 'ObjID already in use') > JmxReporterServer can only use one port each. > So, jmx server might be able to be initialized on multiple ports. > > error log: > ( jmx reporter for metadata is initialized first, then reporter for data > occurs exception ) > {code:java} > 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Loading > HoodieTableMetaClient from > /data/feeder/affiliate/book/affiliate_feeder_book_svc > 24/05/13 20:28:27 INFO table.HoodieTableConfig: Loading table properties from > /data/feeder/affiliate/book/affiliate_feeder_book_svc/.hoodie/hoodie.properties > 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Finished Loading Table of > type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from > /data/feeder/affiliate/book/affiliate_feeder_book_svc > 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Loading > HoodieTableMetaClient from > /data/feeder/affiliate/book/affiliate_feeder_book_svc > 24/05/13 20:28:27 INFO table.HoodieTableConfig: Loading table properties from > /data/feeder/affiliate/book/affiliate_feeder_book_svc/.hoodie/hoodie.properties > 24/05/13 20:28:27 INFO table.HoodieTableMetaClient: Finished Loading Table of > type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from > /data/feeder/affiliate/book/affiliate_feeder_book_svc > 24/05/13 20:28:27 INFO timeline.HoodieActiveTimeline: Loaded instants upto : > Option{val=[==>20240513195519782__deltacommit__REQUESTED__20240513195521160]} > 24/05/13 20:28:28 INFO config.HoodieWriteConfig: Automatically set > hoodie.cleaner.policy.failed.writes=LAZY since optimistic concurrency control > is used > 24/05/13 20:28:28 INFO metrics.JmxMetricsReporter: Started JMX server on port > 9889. > 24/05/13 20:28:28 INFO metrics.JmxMetricsReporter: Configured JMXReporter > with {port:9889} > 24/05/13 20:28:28 INFO embedded.EmbeddedTimelineService: Overriding hostIp to > (feeder-affiliate-book-svc-sink-09c3c08f71b47a5d-driver-svc.csp.svc) found in > spark-conf. It was null > 24/05/13 20:28:28 INFO view.FileSystemViewManager: Creating View Manager with > storage type :MEMORY > 24/05/13 20:28:28 INFO view.FileSystemViewManager: Creating in-memory based > Table View > 24/05/13 20:28:28 INFO util.log: Logging initialized @53678ms to > org.apache.hudi.org.apache.jetty.util.log.Slf4jLog > 24/05/13 20:28:28 INFO javalin.Javalin: > __ __ _ __ __ > / / _ _ __ _ / /(_) / // / > __ / // __ `/| | / // __ `// // // __ \ / // /_ > / /_/ // /_/ / | |/ // /_/ // // // / / / /__ __/ > \/ \__,_/ |___/ \__,_//_//_//_/ /_/ /_/ > https://javalin.io/documentation > 24/05/13 20:28:28 INFO javalin.Javalin: Starting Javalin ... > 24/05/13 20:28:28 INFO javalin.Javalin: You are running Javalin 4.6.7 > (released October 24, 2022. Your Javalin version is 567 days old. Consider > checking for a newer version.). > 24/05/13 20:28:28 INFO server.Server: jetty-9.4.48.v20220622; built: > 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm > 11.0.20.1+1 > 24/05/13 20:28:28 INFO server.Server: Started @54065ms > 24/05/13 20:28:28 INFO javalin.Javalin: Listening on http://localhost:35071/ > 24/05/13 20:28:28 INFO javalin.Javalin: Javalin started in 177ms \o/ > 24/05/13 20:28:28 INFO service.TimelineService: Starting Timeline server on > port :35071 > 24/05/13 20:28:28 INFO embedded.EmbeddedTimelineService: Started embedded > timeline server at > feeder-affiliate-book-svc-sink-09c3c08f71b47a5d-driver-svc.csp.svc:35071 > 24/05/13 20:28:28 INFO client.BaseHoodieClient: Timeline Server already > running. Not restarting the service > 24/05/13 20:28:28 INFO hudi.HoodieSparkSqlWriterInternal: > Config.inlineCompactionEnabled ? true > 24/05/13 20:28:28 INFO hudi.HoodieSparkSqlWr
[jira] [Updated] (HUDI-7762) Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5
[ https://issues.apache.org/jira/browse/HUDI-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7762: - Labels: pull-request-available (was: ) > Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In > Spark3.5 > - > > Key: HUDI-7762 > URL: https://issues.apache.org/jira/browse/HUDI-7762 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ma Jian >Priority: Major > Labels: pull-request-available > > In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic > within Delta. When executed on a Delta table, this may result in an error. > Therefore, the logic to determine whether it is a Hudi operation has been > altered to class name checks to prevent errors during Delta Lake executions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7759) Remove Hadoop dependencies in hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7759: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Remove Hadoop dependencies in hudi-common module > > > Key: HUDI-7759 > URL: https://issues.apache.org/jira/browse/HUDI-7759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7758) MDT Initialization Parses Non-Hudi files
[ https://issues.apache.org/jira/browse/HUDI-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7758: - Labels: pull-request-available (was: ) > MDT Initialization Parses Non-Hudi files > > > Key: HUDI-7758 > URL: https://issues.apache.org/jira/browse/HUDI-7758 > Project: Apache Hudi > Issue Type: Bug >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > Right now the MDT initialization will parse files that do not belong to the > Hudi table -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7717) hoodie.combine.before.insert silently broken for bulk_insert if meta fields disabled (causes duplicates)
[ https://issues.apache.org/jira/browse/HUDI-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7717: - Labels: pull-request-available (was: ) > hoodie.combine.before.insert silently broken for bulk_insert if meta fields > disabled (causes duplicates) > > > Key: HUDI-7717 > URL: https://issues.apache.org/jira/browse/HUDI-7717 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Aditya Goenka >Assignee: Geser Dugarov >Priority: Critical > Labels: pull-request-available > Fix For: 0.15.0 > > > Github issue - [https://github.com/apache/hudi/issues/11044] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7752) Abstract serializeRecords for log writing
[ https://issues.apache.org/jira/browse/HUDI-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7752: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Abstract serializeRecords for log writing > - > > Key: HUDI-7752 > URL: https://issues.apache.org/jira/browse/HUDI-7752 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7754) Remove AvroWriteSupport and ParquetReaderIterator from hudi-common
[ https://issues.apache.org/jira/browse/HUDI-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7754: - Labels: pull-request-available (was: ) > Remove AvroWriteSupport and ParquetReaderIterator from hudi-common > -- > > Key: HUDI-7754 > URL: https://issues.apache.org/jira/browse/HUDI-7754 > Project: Apache Hudi > Issue Type: Task >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > 2 classes with hadoop deps that can be moved to hadoop common and aren't > covered by other prs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7750) Move HoodieLogFormatWriter class to hoodie-hadoop-common module
[ https://issues.apache.org/jira/browse/HUDI-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7750: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move HoodieLogFormatWriter class to hoodie-hadoop-common module > --- > > Key: HUDI-7750 > URL: https://issues.apache.org/jira/browse/HUDI-7750 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7589) Add API to create HoodieStorage in HoodieIOFactory
[ https://issues.apache.org/jira/browse/HUDI-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7589: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Add API to create HoodieStorage in HoodieIOFactory > -- > > Key: HUDI-7589 > URL: https://issues.apache.org/jira/browse/HUDI-7589 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > > We should use the HoodieIOFactory to create HoodieStorage instance, to > replace the hardcoded reflection logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7749) Upgrade Spark patch version to include a fix related to data correctness
[ https://issues.apache.org/jira/browse/HUDI-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7749: - Labels: pull-request-available (was: ) > Upgrade Spark patch version to include a fix related to data correctness > > > Key: HUDI-7749 > URL: https://issues.apache.org/jira/browse/HUDI-7749 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/SPARK-44805 shows data correctness > issue with Spark 3.3.1 and 3.4.1. We have already upgraded to Spark 3.4.3 in > [https://github.com/apache/hudi/commit/cdd146b2c73d50a28bee9f712b689df4fc923222.] > We should upgrade to 3.3.4. The issue does not affect 3.2.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7748) Add logs and drop _hoodie_is_deleted in Transformer
[ https://issues.apache.org/jira/browse/HUDI-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7748: - Labels: pull-request-available (was: ) > Add logs and drop _hoodie_is_deleted in Transformer > --- > > Key: HUDI-7748 > URL: https://issues.apache.org/jira/browse/HUDI-7748 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common
[ https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7745: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move Hadoop-dependent util methods to hudi-hadoop-common > > > Key: HUDI-7745 > URL: https://issues.apache.org/jira/browse/HUDI-7745 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it
[ https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7744: - Labels: pull-request-available (was: ) > Create HoodieIOFactory and config to set it > --- > > Key: HUDI-7744 > URL: https://issues.apache.org/jira/browse/HUDI-7744 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core, writer-core >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Create HoodieIOFactory that will give the appropriate reader and writer > factories based on a config. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code
[ https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7731: - Labels: pull-request-available (was: ) > Fix usage of new Configuration() in production code > --- > > Key: HUDI-7731 > URL: https://issues.apache.org/jira/browse/HUDI-7731 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > new Configuration() is used in non-test code in several places: > HoodieParquetDataBlock.java > Metrics.java > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7742) Move Hadoop-dependent reader util classes to hudi-hadoop-common module
[ https://issues.apache.org/jira/browse/HUDI-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7742: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move Hadoop-dependent reader util classes to hudi-hadoop-common module > -- > > Key: HUDI-7742 > URL: https://issues.apache.org/jira/browse/HUDI-7742 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7743) Fix simple mistakes with StoragePath in production code.
[ https://issues.apache.org/jira/browse/HUDI-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7743: - Labels: pull-request-available (was: ) > Fix simple mistakes with StoragePath in production code. > > > Key: HUDI-7743 > URL: https://issues.apache.org/jira/browse/HUDI-7743 > Project: Apache Hudi > Issue Type: Task > Components: code-quality >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Fix many simple mistakes with StoragePath such as doing extra conversions, > not using util methods etc. > Don't fix any mistakes in tests for now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common
[ https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7729: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move ParquetUtils to hudi-hadoop-common > --- > > Key: HUDI-7729 > URL: https://issues.apache.org/jira/browse/HUDI-7729 > Project: Apache Hudi > Issue Type: Task > Components: core >Reporter: Jonathan Vexler >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Move ParquetUtils to hudi-hadoop-common. The methods in hudi-common that are > called directly from hudi-common should be abstracted to the base utils class > and should throw not implemented from orc utils -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5616) Docs update for specifying org.apache.spark.HoodieSparkKryoRegistrar
[ https://issues.apache.org/jira/browse/HUDI-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5616: - Labels: pull-request-available (was: ) > Docs update for specifying org.apache.spark.HoodieSparkKryoRegistrar > > > Key: HUDI-5616 > URL: https://issues.apache.org/jira/browse/HUDI-5616 > Project: Apache Hudi > Issue Type: Improvement > Components: docs >Reporter: Ethan Guo >Assignee: Shiyan Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > > There is a usability change in [this > PR|https://github.com/apache/hudi/pull/7702] that requires a new conf for > spark users > --conf spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar > There will be a hit on performance (it was actually always there) if this is > not specified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7726) Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils
[ https://issues.apache.org/jira/browse/HUDI-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7726: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils > -- > > Key: HUDI-7726 > URL: https://issues.apache.org/jira/browse/HUDI-7726 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
[ https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7739: - Labels: pull-request-available (was: ) > Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy > -- > > Key: HUDI-7739 > URL: https://issues.apache.org/jira/browse/HUDI-7739 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7738) FileStreamReader need set Charset with UTF-8
[ https://issues.apache.org/jira/browse/HUDI-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7738: - Labels: pull-request-available (was: ) > FileStreamReader need set Charset with UTF-8 > > > Key: HUDI-7738 > URL: https://issues.apache.org/jira/browse/HUDI-7738 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > > FileStreamReader need set Charset with UTF-8 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7737) Bump Spark 3.4 version to Spark 3.4.3
[ https://issues.apache.org/jira/browse/HUDI-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7737: - Labels: pull-request-available (was: ) > Bump Spark 3.4 version to Spark 3.4.3 > - > > Key: HUDI-7737 > URL: https://issues.apache.org/jira/browse/HUDI-7737 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > Labels: pull-request-available > > Spark 3.4.3 has been released: https://github.com/apache/spark/tree/v3.4.3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7735) Remove usage of SerializableConfiguration
[ https://issues.apache.org/jira/browse/HUDI-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7735: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Remove usage of SerializableConfiguration > - > > Key: HUDI-7735 > URL: https://issues.apache.org/jira/browse/HUDI-7735 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7734) Remove unused FSPermissionDTO
[ https://issues.apache.org/jira/browse/HUDI-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7734: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Remove unused FSPermissionDTO > - > > Key: HUDI-7734 > URL: https://issues.apache.org/jira/browse/HUDI-7734 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7728) Use StorageConfiguration in LockProvider constructors
[ https://issues.apache.org/jira/browse/HUDI-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7728: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Use StorageConfiguration in LockProvider constructors > - > > Key: HUDI-7728 > URL: https://issues.apache.org/jira/browse/HUDI-7728 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7727) Avoid constructAbsolutePathInHadoopPath in hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7727: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Avoid constructAbsolutePathInHadoopPath in hudi-common module > - > > Key: HUDI-7727 > URL: https://issues.apache.org/jira/browse/HUDI-7727 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7725) Restructure HFileBootstrapIndex to separate Hadoop-dependent logic
[ https://issues.apache.org/jira/browse/HUDI-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7725: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Restructure HFileBootstrapIndex to separate Hadoop-dependent logic > -- > > Key: HUDI-7725 > URL: https://issues.apache.org/jira/browse/HUDI-7725 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7723) DayBasedCompactionStrategy support io bounded
[ https://issues.apache.org/jira/browse/HUDI-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7723: - Labels: pull-request-available (was: ) > DayBasedCompactionStrategy support io bounded > - > > Key: HUDI-7723 > URL: https://issues.apache.org/jira/browse/HUDI-7723 > Project: Apache Hudi > Issue Type: Improvement > Components: compaction >Reporter: Askwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7721) Fix broken build on master
[ https://issues.apache.org/jira/browse/HUDI-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7721: - Labels: pull-request-available (was: ) > Fix broken build on master > -- > > Key: HUDI-7721 > URL: https://issues.apache.org/jira/browse/HUDI-7721 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > > TestHoodieDeltaStreamer is invalid due to > [https://github.com/apache/hudi/pull/11099.] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7350) Introduce HoodieIOFactory to abstract the reader and writer implementation
[ https://issues.apache.org/jira/browse/HUDI-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7350: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Introduce HoodieIOFactory to abstract the reader and writer implementation > -- > > Key: HUDI-7350 > URL: https://issues.apache.org/jira/browse/HUDI-7350 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Jonathan Vexler >Priority: Blocker > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7720) Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups
[ https://issues.apache.org/jira/browse/HUDI-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7720: - Labels: pull-request-available (was: ) > Fix HoodieTableFileSystemView NPE in fetchAllStoredFileGroups > - > > Key: HUDI-7720 > URL: https://issues.apache.org/jira/browse/HUDI-7720 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Attachments: 1280X1280.PNG > > > Job aborted due to stage failure: Task 3 in stage 35.0 failed 4 times, most > recent failure: Lost task 3.3 in stage 35.0 (TID 32175) (10-222-33-34.lan > executor 204): java.lang.NullPointerException > at java.util.ArrayList.(ArrayList.java:178) > at > org.apache.hudi.common.table.view.HoodieTableFileSystemView.fetchAllStoredFileGroups(HoodieTableFileSystemView.java:308) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFileGroupsIncludingReplaced(AbstractTableFileSystemView.java:976) > at > org.apache.hudi.common.table.view.AbstractTableFileSystemView.getReplacedFileGroupsBefore(AbstractTableFileSystemView.java:989) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:104) > at > org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getReplacedFileGroupsBefore(PriorityBasedFileSystemView.java:232) > at > org.apache.hudi.table.action.clean.CleanPlanner.getReplacedFilesEligibleToClean(CleanPlanner.java:441) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:330) > at > org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:295) > at > org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:493) > at > org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$af5da5d2$1(CleanPlanActionExecutor.java:122) > at > org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at > scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) > at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) > at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) > at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at > scala.collection.AbstractIterator.to(Iterator.scala:1431) at > scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at > scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) > at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) > at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at > scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at > org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030) > at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at > org.apache.spark.scheduler.Task.run(Task.scala:131) at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7718) Use source profile in HoodieIncrSource
[ https://issues.apache.org/jira/browse/HUDI-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7718: - Labels: pull-request-available (was: ) > Use source profile in HoodieIncrSource > -- > > Key: HUDI-7718 > URL: https://issues.apache.org/jira/browse/HUDI-7718 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Vinish Reddy >Assignee: Vinish Reddy >Priority: Minor > Labels: pull-request-available > > Use source profile in HoodieIncrSource for utilising proper parallelism and > numInstantsPerFetch based on data volume. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7654) Implement the pre-CBO rules
[ https://issues.apache.org/jira/browse/HUDI-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7654: - Labels: pull-request-available (was: ) > Implement the pre-CBO rules > --- > > Key: HUDI-7654 > URL: https://issues.apache.org/jira/browse/HUDI-7654 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vova Kolmakov >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7716) Add more logs around index lookup
[ https://issues.apache.org/jira/browse/HUDI-7716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7716: - Labels: pull-request-available (was: ) > Add more logs around index lookup > - > > Key: HUDI-7716 > URL: https://issues.apache.org/jira/browse/HUDI-7716 > Project: Apache Hudi > Issue Type: Improvement > Components: index >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7715) Partition TTL for Flink
[ https://issues.apache.org/jira/browse/HUDI-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7715: - Labels: pull-request-available (was: ) > Partition TTL for Flink > --- > > Key: HUDI-7715 > URL: https://issues.apache.org/jira/browse/HUDI-7715 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xi chaomin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7713) Schema Reconciliation should also re-order fields
[ https://issues.apache.org/jira/browse/HUDI-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7713: - Labels: pull-request-available (was: ) > Schema Reconciliation should also re-order fields > - > > Key: HUDI-7713 > URL: https://issues.apache.org/jira/browse/HUDI-7713 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Timothy Brown >Assignee: Timothy Brown >Priority: Major > Labels: pull-request-available > > The schema reconciliation current makes sure the incoming schema is > compatible with the target but it can also be used to guarantee a consistent > ordering of fields in the schema between commits. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7712) Account for file slices instead of just base files while initializing RLI for MOR table
[ https://issues.apache.org/jira/browse/HUDI-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7712: - Labels: pull-request-available (was: ) > Account for file slices instead of just base files while initializing RLI for > MOR table > --- > > Key: HUDI-7712 > URL: https://issues.apache.org/jira/browse/HUDI-7712 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > we could have deletes in log files. and hence we need to account for entire > file slice instead of just base files while initializing RLI for MOR table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7704) Unify test client storage classes with duplicate code
[ https://issues.apache.org/jira/browse/HUDI-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7704: - Labels: pull-request-available (was: ) > Unify test client storage classes with duplicate code > -- > > Key: HUDI-7704 > URL: https://issues.apache.org/jira/browse/HUDI-7704 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jonathan Vexler >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > > TestHoodieClientOnCopyOnWriteStorage > TestHoodieJavaClientOnCopyOnWriteStorage > have a bunch of duplicate code -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer
[ https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7711: - Labels: pull-request-available (was: ) > Fix MultiTableStreamer can deal with path of properties file for each streamer > -- > > Key: HUDI-7711 > URL: https://issues.apache.org/jira/browse/HUDI-7711 > Project: Apache Hudi > Issue Type: Bug > Components: hudi-utilities > Environment: hudi0.14.1, Spark3.2 >Reporter: Jihwan Lee >Priority: Major > Labels: pull-request-available > > HudiMultiTableStreamer initializes common configs, then deepcopy related > fields into each streams. > Because _propsFilePath_ on each streamer is not handled, they always retrieve > path of test files as default value. > > Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer > should be able to have these configs. (such like inheritance) > > MultiTable configs (kafka-source.properties): > > {code:java} > ... > hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2 > hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties > hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties > ... {code} > > > /tmp/config_1.properties: > > {code:java} > ... > hoodie.datasource.write.recordkey.field=id > hoodie.streamer.source.kafka.topic=topic1 > ... {code} > > > /tmp/config_2.properties: > {code:java} > ... > hoodie.datasource.write.recordkey.field=id > hoodie.streamer.source.kafka.topic=topic2 > ... {code} > > error log (workspace is replaced to \{RUNNING_PATH}) : > > {code:java} > 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in > properties from dfs from file > file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties > 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline > server > 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while > running MultiTableDeltaStreamer for table: review_processed_data > org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs > from file > file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87) > at > org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453) > at > org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568) > at > org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540) > at > org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874) > at > org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:101) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216) > at > org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457) > at > org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache
[jira] [Updated] (HUDI-7710) BugFix: Remove compaction.inflight from conflict resolution
[ https://issues.apache.org/jira/browse/HUDI-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7710: - Labels: pull-request-available (was: ) > BugFix: Remove compaction.inflight from conflict resolution > --- > > Key: HUDI-7710 > URL: https://issues.apache.org/jira/browse/HUDI-7710 > Project: Apache Hudi > Issue Type: Improvement > Components: compaction >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Critical > Labels: pull-request-available > > During conflict resolution, compaction.inflight is found; since they don't > contain any plan information, this could cause NPE error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7707) Enable bundle validation on Java 8 and 11
[ https://issues.apache.org/jira/browse/HUDI-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7707: - Labels: pull-request-available (was: ) > Enable bundle validation on Java 8 and 11 > - > > Key: HUDI-7707 > URL: https://issues.apache.org/jira/browse/HUDI-7707 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: Screenshot 2024-05-02 at 17.41.02.png > > > Bundle validation with Java 8 and 11 are somehow skipped in GH CI. They > should be enabled. !Screenshot 2024-05-02 at > 17.41.02.png|width=905,height=325! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7706) Improve validation in PARTITION_STATS index test
[ https://issues.apache.org/jira/browse/HUDI-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7706: - Labels: pull-request-available (was: ) > Improve validation in PARTITION_STATS index test > > > Key: HUDI-7706 > URL: https://issues.apache.org/jira/browse/HUDI-7706 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > We should add the record key in MDT when validating the partition stats. -- This message was sent by Atlassian Jira (v8.20.10#820010)