KarthickAN opened a new issue #2970:
URL: https://github.com/apache/hudi/issues/2970


   Hi,
   I keep getting the following error intermittently and I'm not sure what 
causes this issue. There may be two different hudi jobs running parallelly and 
writing to the same bucket. Will that be an issue ? Also Please guide me in 
resolving the following error.
   
   py4j.protocol.Py4JJavaError: An error occurred while calling o318.save.
   : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for 
commit time 20210520040253
        at 
org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:62)
        at 
org.apache.hudi.table.action.commit.UpsertCommitActionExecutor.execute(UpsertCommitActionExecutor.java:45)
        at 
org.apache.hudi.table.HoodieCopyOnWriteTable.upsert(HoodieCopyOnWriteTable.java:88)
        at 
org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:193)
        at 
org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException
        at 
org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
        at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
        at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:384)
        at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:139)
        at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.execute(BaseCommitActionExecutor.java:89)
        at 
org.apache.hudi.table.action.commit.WriteHelper.write(WriteHelper.java:55)
        ... 38 more
   
   Below are my hudi config:::
   
   SmallFileSize = 104857600
   MaxFileSize = 125829120
   RecordSize = 35
   CompressionRatio = 5
   InsertSplitSize = 3500000
   IndexBloomNumEntries = 1500000
   KeyGenClass = org.apache.hudi.keygen.ComplexKeyGenerator
   RecordKeyFields = sourceid,sourceassetid,sourceeventid,value,timestamp
   TableType = COPY_ON_WRITE
   PartitionPathFields = date,sourceid
   HiveStylePartitioning = True
   WriteOperation = upsert
   CompressionCodec = snappy
   CommitsRetained = 1
   CombineBeforeInsert = True
   PrecombineField = timestamp
   InsertDropDuplicates = False
   InsertShuffleParallelism = 100
   
   Environment Description
   
   Hudi version : 0.6.0
   
   Spark version : 2.4.3
   
   Hadoop version : 2.8.5-amzn-1
   
   Storage (HDFS/S3/GCS..) : S3
   
   Running on Docker? (yes/no) : No. Running on AWS Glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to