KarthickAN opened a new issue #2970: URL: https://github.com/apache/hudi/issues/2970
Hi, I keep getting the following error intermittently and I'm not sure what causes this issue. There may be two different hudi jobs running parallelly and writing to the same bucket. Will that be an issue ? Also Please guide me in resolving the following error. py4j.protocol.Py4JJavaError: An error occurred while calling o318.save. : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210520040253 at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62) at org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45) at org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88) at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89) at org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55) ... 38 more Below are my hudi config::: SmallFileSize = 104857600 MaxFileSize = 125829120 RecordSize = 35 CompressionRatio = 5 InsertSplitSize = 3500000 IndexBloomNumEntries = 1500000 KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp TableType = COPY_ON_WRITE PartitionPathFields = date,sourceid HiveStylePartitioning = True WriteOperation = upsert CompressionCodec = snappy CommitsRetained = 1 CombineBeforeInsert = True PrecombineField = timestamp InsertDropDuplicates = False InsertShuffleParallelism = 100 Environment Description Hudi version : 0.6.0 Spark version : 2.4.3 Hadoop version : 2.8.5-amzn-1 Storage (HDFS/S3/GCS..) : S3 Running on Docker? (yes/no) : No. Running on AWS Glue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org