Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2024-04-08 Thread via GitHub


nsivabalan commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-2044035691

   hey @ad1happy2go : any follow ups on this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-31 Thread via GitHub


rahil-c commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1788226880

   cc @yihua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-27 Thread via GitHub


arunvasudevan commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1783346605

   @ad1happy2go Messaged you on Hudi Slack. We can connect more about this 
issue in slack, Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-23 Thread via GitHub


ad1happy2go commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1776508692

   @arunvasudevan Are you there on hudi slack? If yes, can you message me there 
to have a call to understand the issue more. Thanks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-19 Thread via GitHub


arunvasudevan commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1771769666

   @ad1happy2go Let me know if you need any more info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-17 Thread via GitHub


arunvasudevan commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1767272065

   Yes, checked the archive folder and it is empty in this case. 
   
   Here are the writer configurtions.
   
   hoodie.datasource.hive_sync.database: 
   hoodie.datasource.hive_sync.mode: HMS
   hoodie.datasource.write.precombine.field: source_ts_ms
   hoodie.datasource.hive_sync.partition_extractor_class: 
org.apache.hudi.hive.NonPartitionedExtractor
   hoodie.parquet.max.file.size: 67108864
   hoodie.datasource.meta.sync.enable: true
   hoodie.datasource.hive_sync.skip_ro_suffix: true
   hoodie.metadata.enable: false
   hoodie.datasource.hive_sync.table: 
   hoodie.index.type: SIMPLE
   hoodie.clean.automatic: true
   hoodie.datasource.write.operation: upsert
   hoodie.metrics.reporter.type: CLOUDWATCH
   hoodie.datasource.hive_sync.enable: true
   hoodie.datasource.write.recordkey.field: version_id
   hoodie.table.name: ride_version
   hoodie.datasource.hive_sync.jdbcurl: jdbc:hive2://ip-:1
   hoodie.datasource.write.table.type: MERGE_ON_READ
   hoodie.simple.index.parallelism: 240
   hoodie.write.lock.dynamodb.partition_key: 
   hoodie.cleaner.policy: KEEP_LATEST_BY_HOURS
   hoodie.compact.inline: true
   hoodie.client.heartbeat.interval_in_ms: 60
   hoodie.datasource.compaction.async.enable: true
   hoodie.metrics.on: true
   hoodie.datasource.write.keygenerator.class: 
org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.cleaner.policy.failed.writes: LAZY
   hoodie.keep.max.commits: 1650
   hoodie.cleaner.hours.retained: 168
   hoodie.write.lock.dynamodb.table: peloton-prod-hudi-write-lock
   hoodie.write.lock.provider: 
org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
   hoodie.keep.min.commits: 1600
   hoodie.datasource.write.partitionpath.field: 
   hoodie.compact.inline.max.delta.commits: 1
   hoodie.write.concurrency.mode: optimistic_concurrency_control
   hoodie.write.lock.dynamodb.region: us-east-1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-15 Thread via GitHub


ad1happy2go commented on issue #9861:
URL: https://github.com/apache/hudi/issues/9861#issuecomment-1763776089

   @arunvasudevan Thanks for raising this. If I have understood it clearly then 
your issue is cleaner is not cleaning the file occasionally. You mentioned that 
you dont see that file in any commit timeline. Did you checked archived commits 
too?
   
   Can you also provide your writer configurations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Facing java.util.NoSuchElementException on EMR 6.12 (Hudi 0.13) with inline compaction and cleaning on MoR tables [hudi]

2023-10-13 Thread via GitHub


arunvasudevan opened a new issue, #9861:
URL: https://github.com/apache/hudi/issues/9861

   Running inline compaction and cleaning process for MoR tables on EMR - 6.12 
(i.e Hudi 0.13) 
   Facing NoSuchElementException on a FileID. The specific file is present in 
S3 but that file is not in any commit timeline.
   This is a low frequency table so this table gets an insert record once every 
month and our cleaner and compactor is configured to be inline. But, the 
cleaner does not cleanup the file causing the writer to fail.
   There is another file on the same path with a different fileID that is 
present with an updated data.
   To resolve this issue we deleted thsi orphaned stale file. But its really 
not clear why this issue is occuring.
   
   Here is the hoodie.properties file
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   hoodie.table.type=MERGE_ON_READ
   hoodie.table.metadata.partitions=
   hoodie.table.precombine.field=source_ts_ms
   hoodie.table.partition.fields=
   hoodie.archivelog.folder=archived
   hoodie.timeline.layout.version=1
   hoodie.table.checksum=4106800621
   hoodie.datasource.write.drop.partition.columns=false
   hoodie.table.timeline.timezone=LOCAL
   hoodie.table.name=performer_ride_join_table
   hoodie.table.recordkey.fields=performer_id,ride_id
   hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
   hoodie.datasource.write.hive_style_partitioning=false
   hoodie.partition.metafile.use.base.format=false
   
hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.populate.meta.fields=true
   hoodie.table.base.file.format=PARQUET
   hoodie.database.name=
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.version=5
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   This does not happen regularly in the past 3 weeks it happen 2 times - 1 per 
week
   
   1. Create a Hudi writer with the above config
   2. set HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key() -> "1",
   4. Insert a record
   
   **Expected behavior**
   
   Cleaner should have cleaned up the orphaned file.
   
   **Environment Description**
   
   * Hudi version : 0.13
   
   * Spark version : 3.4.0
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   This is a low frequency table so this table gets an insert record once every 
month and our cleaner and compactor is configured to be inline. But, the 
cleaner does not cleanup the file causing the writer to fail.
   There is another file on the same path with a different fileID that is 
present with an updated data.
   To resolve this issue we deleted thsi orphaned stale file. But its really 
not clear why this issue is occuring.
   
   **Stacktrace**
   
   Error:
   23/10/09 17:27:39 ERROR Client: Application diagnostics message: User class 
threw exception: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 174.0 failed 4 times, most recent failure: Lost task 
0.3 in stage 174.0 (TID 30924) (ip-10-11-117-156.ec2.internal executor 28): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:336)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:251)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377)
at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552)
at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462)
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375)
at