KnightChess created HUDI-5797:
---------------------------------

             Summary: bulk insert as row will throw error without mdt init
                 Key: HUDI-5797
                 URL: https://issues.apache.org/jira/browse/HUDI-5797
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark
            Reporter: KnightChess
            Assignee: KnightChess


`bulkinsert as row` not initTable first, it will trigger mdt init when commit 
result after write in the same job, and this init will use fileSystem to init, 
which will contain orphan file or error file. For example, if writer not flush 
but kill by RM, the parquet file size may be 0, will triiger the following 
questions when init mdt.

 
{code:java}
Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most 
recent failure: Lost task 1.3 in stage 13.0 (TID 102100) 
(bigdata-nmg-hdp10339.nmg01.diditaxi.com executor 832): 
java.lang.IllegalStateException
        at 
org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:53)
        at 
org.apache.hudi.metadata.HoodieMetadataPayload.lambda$null$4(HoodieMetadataPayload.java:328)
        at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
        at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
        at 
java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1683)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at 
org.apache.hudi.metadata.HoodieMetadataPayload.lambda$createPartitionFilesRecord$5(HoodieMetadataPayload.java:323)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at 
org.apache.hudi.metadata.HoodieMetadataPayload.createPartitionFilesRecord(HoodieMetadataPayload.java:321)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.lambda$getFilesPartitionRecords$f70c2081$1(HoodieBackedTableMetadataWriter.java:1105)
        at 
org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1892)
        at org.apache.spark.rdd.RDD.$anonfun$count$1(RDD.scala:1249)
        at org.apache.spark.rdd.RDD.$anonfun$count$1$adapted(RDD.scala:1249)
        at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2261)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1463)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to