kepplertreet opened a new issue, #7453: URL: https://github.com/apache/hudi/issues/7453
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? Yes - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** After the initial bulk insert ran a streaming job with the below mentioned HUDI configs. Fails to Upsert for a given commit time. **To Reproduce** Steps to reproduce the behavior: 1. Carry out a table bulk insert using the following Hudi Configs 2. Ran a Spark Structured Streaming Application on top of it for Incremental CDC **Expected behavior** Column stats are created and used for Incremental Upsert Operations **Environment Description** * Hudi version : 0.11.1 (EMR) * Spark version : 3.3.0 (EMR) * Hive version : 3.1.3 (EMR) * Emr version : 6.8.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : NO **Additional context** - Hudi Cconfigs *Bulk Insert* ` "hoodie.table.name": <table_name> , "hoodie.datasource.write.table.name": <table_name> , "hoodie.datasource.write.table.type" : "MERGE_ON_READ", "hoodie.datasource.write.recordkey.field": "id", "hoodie.datasource.write.partitionpath.field" : "_year_month", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.SimpleKeyGenerator", "hoodie.datasource.hive_sync.table" : <table_name> , "hoodie.datasource.hive_sync.database" : <database_name> , "hoodie.datasource.write.row.writer.enable" : "true", "hoodie.upsert.shuffle.parallelism": 6, "hoodie.bulkinsert.shuffle.parallelism" : 338, "hoodie.table.version": "4", "hoodie.datasource.write.operation": "bulk_insert", "hoodie.datasource.write.hive_style_partitioning": "false", "hoodie.datasource.write.precombine.field": "_commit_time_ms", "hoodie.datasource.write.commitmeta.key.prefix": "_", "hoodie.datasource.write.insert.drop.duplicates": "false", "hoodie.datasource.hive_sync.enable": "true", "hoodie.datasource.hive_sync.use_jdbc": "true", "hoodie.datasource.hive_sync.auto_create_database": "true", "hoodie.datasource.hive_sync.support_timestamp": "false", "hoodie.datasource.hive_sync.skip_ro_suffix": "true", "hoodie.parquet.compression.codec": "snappy", "hoodie.metrics.on": "false", "hoodie.metadata.enable": "true", "hoodie.metadata.metrics.enable": "false", "hoodie.metadata.clean.async": "false", "hoodie.metadata.index.column.stats.enable": "true", "hoodie.metadata.index.bloom.filter.enable": "true", "hoodie.datasource.compaction.async.enable": "false", "hoodie.compact.inline": "true", "hoodie.index.type": "BLOOM", "hoodie.parquet.small.file.limit": 209715200, "hoodie.parquet.max.file.size": 268435456` * Upsert **(Spark Structured Streaming)** * Property Value 0 hoodie.table.version 4 1 hoodie.datasource.write.operation upsert 2 hoodie.datasource.write.hive_style_partitioning false 3 hoodie.datasource.write.precombine.field _commit_time_ms 4 hoodie.datasource.write.commitmeta.key.prefix _ 5 hoodie.datasource.write.insert.drop.duplicates false 6 hoodie.datasource.hive_sync.enable true 7 hoodie.datasource.hive_sync.use_jdbc true 8 hoodie.datasource.hive_sync.auto_create_database true 9 hoodie.datasource.hive_sync.support_timestamp false 10 hoodie.datasource.hive_sync.skip_ro_suffix true 11 hoodie.parquet.compression.codec snappy 12 hoodie.metrics.on true 13 hoodie.metrics.reporter.type PROMETHEUS_PUSHGATEWAY 14 hoodie.metrics.pushgateway.host <ip_address> 15 hoodie.metrics.pushgateway.port <port_number> 16 hoodie.metrics.pushgateway.random.job.name.suffix false 17 hoodie.metrics.pushgateway.report.period.seconds 30 18 hoodie.metadata.enable true 19 hoodie.metadata.metrics.enable true 20 hoodie.metadata.clean.async true 21 hoodie.metadata.index.column.stats.enable true 22 hoodie.metadata.index.bloom.filter.enable true 23 hoodie.write.concurrency.mode OPTIMISTIC_CONCURRENCY_CONTROL 24 hoodie.datasource.compaction.async.enable true 25 hoodie.compact.schedule.inline true 26 hoodie.compact.inline.trigger.strategy NUM_COMMITS 27 hoodie.compact.inline.max.delta.commits 1 28 hoodie.index.type BLOOM 29 hoodie.cleaner.policy.failed.writes LAZY 30 hoodie.clean.automatic true 31 hoodie.clean.async true 32 hoodie.cleaner.commits.retained 2 33 hoodie.write.lock.client.num_retries 10 34 hoodie.write.lock.wait_time_ms_between_retry 1000 35 hoodie.write.lock.num_retries 15 36 hoodie.write.lock.wait_time_ms 60000 37 hoodie.write.lock.zookeeper.connection_timeout_ms 15000 38 hoodie.bloom.index.use.metadata true 39 hoodie.archive.async true 40 hoodie.table.name <table_name> 41 hoodie.datasource.write.table.name <table_name> 42 hoodie.datasource.write.table.type MERGE_ON_READ 43 hoodie.datasource.write.recordkey.field id 44 hoodie.datasource.write.partitionpath.field _year_month 45 hoodie.datasource.write.keygenerator.class org.apache.hudi.keygen.SimpleKeyGenerator 46 hoodie.datasource.hive_sync.table <table_name> 47 hoodie.datasource.hive_sync.database <database_name> 48 hoodie.metrics.pushgateway.job.name <database_name><table_name> 49 hoodie.write.lock.zookeeper.lock_key <table_name> 50 hoodie.insert.shuffle.parallelism 48 51 hoodie.upsert.shuffle.parallelism 48 52 hoodie.delete.shuffle.parallelism 48 **Stacktrace** ```Add the stacktrace of the error.``` `org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20221213180743126 at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:64) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:46) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:89) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:76) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:155) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:213) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:307) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$2(HoodieStreamingSink.scala:91) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.15.jar:?] at org.apache.hudi.HoodieStreamingSink.$anonfun$addBatch$1(HoodieStreamingSink.scala:90) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.HoodieStreamingSink.retry(HoodieStreamingSink.scala:166) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.HoodieStreamingSink.addBatch(HoodieStreamingSink.scala:89) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$17(MicroBatchExecution.scala:660) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:658) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:658) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:255) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:218) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:67) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:212) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:307) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:285) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:208) ~[spark-sql_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost task 3.3 in stage 6.0 (TID 236) (ip-192-168-2-99.ap-south-1.compute.internal executor 2): java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadColumnRangesFromMetaIndex$cc8e7ca2$1(HoodieBloomIndex.java:233) at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:137) at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) at scala.collection.AbstractIterator.to(Iterator.scala:1431) at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at scala.collection.AbstractIterator.toArray(Iterator.scala:1431) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1021) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2269) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:138) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2863) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2799) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2798) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2798) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1239) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1239) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.15.jar:?] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1239) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3051) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2993) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1009) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2229) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2250) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2269) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2294) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.rdd.RDD.withScope(RDD.scala:406) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.rdd.RDD.collect(RDD.scala:1020) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.hudi.client.common.HoodieSparkEngineContext.flatMap(HoodieSparkEngineContext.java:137) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.index.bloom.HoodieBloomIndex.loadColumnRangesFromMetaIndex(HoodieBloomIndex.java:215) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.index.bloom.HoodieBloomIndex.getBloomIndexFileInfoForPartitions(HoodieBloomIndex.java:147) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.index.bloom.HoodieBloomIndex.lookupIndex(HoodieBloomIndex.java:125) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.index.bloom.HoodieBloomIndex.tagLocation(HoodieBloomIndex.java:91) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:49) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:32) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] ... 41 more Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadColumnRangesFromMetaIndex$cc8e7ca2$1(HoodieBloomIndex.java:233) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:137) ~[hudi-spark3.3-bundle_2.12-0.11.1-amzn-0.jar:0.11.1-amzn-0] at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.15.jar:?] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.15.jar:?] at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?] at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?] at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?] at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62) ~[scala-library-2.12.15.jar:?] at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.to(TraversableOnce.scala:366) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364) ~[scala-library-2.12.15.jar:?] at scala.collection.AbstractIterator.to(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358) ~[scala-library-2.12.15.jar:?] at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345) ~[scala-library-2.12.15.jar:?] at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) ~[scala-library-2.12.15.jar:?] at scala.collection.AbstractIterator.toArray(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?] at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1021) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2269) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.scheduler.Task.run(Task.scala:138) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.0-amzn-0.jar:3.3.0-amzn-0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_342] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_342] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_342]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org