Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
nsivabalan commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-2044035691 hey @ad1happy2go : any follow ups on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
rahil-c commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1788226880 cc @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
arunvasudevan commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1783346605 @ad1happy2go Messaged you on Hudi Slack. We can connect more about this issue in slack, Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
ad1happy2go commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1776508692 @arunvasudevan Are you there on hudi slack? If yes, can you message me there to have a call to understand the issue more. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
arunvasudevan commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1771769666 @ad1happy2go Let me know if you need any more info. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
arunvasudevan commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1767272065 Yes, checked the archive folder and it is empty in this case. Here are the writer configurtions. hoodie.datasource.hive_sync.database: hoodie.datasource.hive_sync.mode: HMS hoodie.datasource.write.precombine.field: source_ts_ms hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.NonPartitionedExtractor hoodie.parquet.max.file.size: 67108864 hoodie.datasource.meta.sync.enable: true hoodie.datasource.hive_sync.skip_ro_suffix: true hoodie.metadata.enable: false hoodie.datasource.hive_sync.table: hoodie.index.type: SIMPLE hoodie.clean.automatic: true hoodie.datasource.write.operation: upsert hoodie.metrics.reporter.type: CLOUDWATCH hoodie.datasource.hive_sync.enable: true hoodie.datasource.write.recordkey.field: version_id hoodie.table.name: ride_version hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://ip-:1 hoodie.datasource.write.table.type: MERGE_ON_READ hoodie.simple.index.parallelism: 240 hoodie.write.lock.dynamodb.partition_key: hoodie.cleaner.policy: KEEP_LATEST_BY_HOURS hoodie.compact.inline: true hoodie.client.heartbeat.interval_in_ms: 60 hoodie.datasource.compaction.async.enable: true hoodie.metrics.on: true hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.NonpartitionedKeyGenerator hoodie.cleaner.policy.failed.writes: LAZY hoodie.keep.max.commits: 1650 hoodie.cleaner.hours.retained: 168 hoodie.write.lock.dynamodb.table: peloton-prod-hudi-write-lock hoodie.write.lock.provider: org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider hoodie.keep.min.commits: 1600 hoodie.datasource.write.partitionpath.field: hoodie.compact.inline.max.delta.commits: 1 hoodie.write.concurrency.mode: optimistic_concurrency_control hoodie.write.lock.dynamodb.region: us-east-1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
ad1happy2go commented on issue #9861: URL: https://github.com/apache/hudi/issues/9861#issuecomment-1763776089 @arunvasudevan Thanks for raising this. If I have understood it clearly then your issue is cleaner is not cleaning the file occasionally. You mentioned that you dont see that file in any commit timeline. Did you checked archived commits too? Can you also provide your writer configurations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]
arunvasudevan opened a new issue, #9861: URL: https://github.com/apache/hudi/issues/9861 Running inline compaction and cleaning process for MoR tables on EMR - 6.12 (i.e Hudi 0.13) Facing NoSuchElementException on a FileID. The specific file is present in S3 but that file is not in any commit timeline. This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail. There is another file on the same path with a different fileID that is present with an updated data. To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring. Here is the hoodie.properties file hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload hoodie.table.type=MERGE_ON_READ hoodie.table.metadata.partitions= hoodie.table.precombine.field=source_ts_ms hoodie.table.partition.fields= hoodie.archivelog.folder=archived hoodie.timeline.layout.version=1 hoodie.table.checksum=4106800621 hoodie.datasource.write.drop.partition.columns=false hoodie.table.timeline.timezone=LOCAL hoodie.table.name=performer_ride_join_table hoodie.table.recordkey.fields=performer_id,ride_id hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5 hoodie.datasource.write.hive_style_partitioning=false hoodie.partition.metafile.use.base.format=false hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator hoodie.populate.meta.fields=true hoodie.table.base.file.format=PARQUET hoodie.database.name= hoodie.datasource.write.partitionpath.urlencode=false hoodie.table.version=5 **To Reproduce** Steps to reproduce the behavior: This does not happen regularly in the past 3 weeks it happen 2 times - 1 per week 1. Create a Hudi writer with the above config 2. set HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key() -> "1", 4. Insert a record **Expected behavior** Cleaner should have cleaned up the orphaned file. **Environment Description** * Hudi version : 0.13 * Spark version : 3.4.0 * Hive version : 3.1.3 * Hadoop version : 3.3.3 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : No **Additional context** This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail. There is another file on the same path with a different fileID that is present with an updated data. To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring. **Stacktrace** Error: 23/10/09 17:27:39 ERROR Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 174.0 failed 4 times, most recent failure: Lost task 0.3 in stage 174.0 (TID 30924) (ip-10-11-117-156.ec2.internal executor 28): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:336) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:251) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at