lamber-ken edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-620246831
## The lastest spark log (master branch) ### Simplified origin spark log ``` // Upsert part Warning: Ignoring non-spark config property: "spark.sql.hive.convertMetastoreParquet=false" Warning: Ignoring non-spark config property: "spark.serializer=org.apache.spark.serializer.KryoSerializer" 20/04/26 17:35:58 WARN S3CryptoModuleAE: Unable to detect encryption information for object 'jar' in bucket '<storageBucket>'. Returning object without decryption. // Start rollback 20/04/26 17:36:34 INFO HoodieWriteClient: Begin rollback of instant 20200426170940 20/04/26 17:36:34 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from <storageLocation> 20/04/26 17:36:34 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://ip-10-0-55-99.ec2.internal:8020] // Output repeatly 20/04/26 17:36:40 INFO FileStatusExt: Metadata Entry doesn't exist ... 20/04/26 18:19:07 INFO FileStatusExt: Metadata Entry doesn't exist // Finish 20/04/26 18:19:08 INFO CleanActionExecutor: Total Partitions to clean : 6052, with policy KEEP_LATEST_COMMITS 20/04/26 18:19:08 INFO CleanActionExecutor: Using cleanerParallelism: 6000 20/04/26 18:19:08 INFO SparkContext: Starting job: collect at CleanActionExecutor.java:87 20/04/26 18:19:08 INFO DAGScheduler: Got job 10 (collect at CleanActionExecutor.java:87) with 6000 output partitions 20/04/26 18:19:08 INFO DAGScheduler: Final stage: ResultStage 30 (collect at CleanActionExecutor.java:87) 20/04/26 18:19:08 INFO DAGScheduler: Parents of final stage: List() 20/04/26 18:19:08 INFO DAGScheduler: Missing parents: List() 20/04/26 18:19:08 INFO DAGScheduler: Submitting ResultStage 30 (MapPartitionsRDD[63] at map at CleanActionExecutor.java:86), which has no missing parents 20/04/26 18:19:08 INFO MemoryStore: Block broadcast_20 stored as values in memory (estimated size 247.7 KB, free 1008.3 MB) 20/04/26 18:19:08 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 87.1 KB, free 1008.2 MB) 20/04/26 18:19:08 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory on ip-10-0-55-99.ec2.internal:44415 (size: 87.1 KB, free: 1008.8 MB) 20/04/26 18:19:08 INFO SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1201 20/04/26 18:19:08 INFO DAGScheduler: Submitting 6000 missing tasks from ResultStage 30 (MapPartitionsRDD[63] at map at CleanActionExecutor.java:86) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 20/04/26 18:19:08 INFO YarnScheduler: Adding task set 30.0 with 6000 tasks 20/04/26 18:19:23 INFO YarnScheduler: Removed TaskSet 32.0, whose tasks have all completed, from pool 20/04/26 18:19:23 INFO DAGScheduler: ResultStage 32 (collect at CleanActionExecutor.java:155) finished in 3.444 s 20/04/26 18:19:23 INFO DAGScheduler: Job 11 finished: collect at CleanActionExecutor.java:155, took 8.313284 s 20/04/26 18:19:23 INFO HoodieActiveTimeline: Loaded instants [[20200425201231__clean__COMPLETED], [20200425201231__commit__COMPLETED], [==>20200426173625__clean__INFLIGHT], [20200426173625__commit__COMPLETED]] 20/04/26 18:19:24 INFO HoodieActiveTimeline: Checking for file exists ?<storageLocation>.hoodie/20200426173625.clean.inflight 20/04/26 18:19:24 INFO CSEMultipartUploadOutputStream: close closed:false <storageLocation>.hoodie/20200426173625.clean 20/04/26 18:19:24 INFO DefaultMultipartUploadDispatcher: Completed multipart upload of 1 parts 273450 bytes 20/04/26 18:19:24 INFO CSEMultipartUploadOutputStream: Finished uploading <storageLocation>.hoodie/20200426173625.clean. Elapsed seconds: 0. 20/04/26 18:19:24 INFO HoodieActiveTimeline: Create new file for toInstant ?<storageLocation>.hoodie/20200426173625.clean 20/04/26 18:19:24 INFO CleanActionExecutor: Marked clean started on 20200426173625 as complete 20/04/26 18:19:24 INFO AbstractHoodieWriteClient: Committed 20200426173625 20/04/26 18:19:24 INFO HoodieSparkSqlWriter$: Commit 20200426173625 successful! 20/04/26 18:19:28 INFO SparkContext: Invoking stop() from shutdown hook 20/04/26 18:19:28 INFO SparkUI: Stopped Spark web UI at ip 20/04/26 18:19:28 INFO YarnClientSchedulerBackend: Interrupting monitor thread 20/04/26 18:19:28 INFO YarnClientSchedulerBackend: Shutting down all executors 20/04/26 18:19:28 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/04/26 18:19:28 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 20/04/26 18:19:28 INFO YarnClientSchedulerBackend: Stopped 20/04/26 18:19:28 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/04/26 18:19:28 INFO MemoryStore: MemoryStore cleared 20/04/26 18:19:28 INFO BlockManager: BlockManager stopped 20/04/26 18:19:28 INFO BlockManagerMaster: BlockManagerMaster stopped 20/04/26 18:19:28 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/04/26 18:19:28 INFO SparkContext: Successfully stopped SparkContext 20/04/26 18:19:28 INFO ShutdownHookManager: Shutdown hook called 20/04/26 18:19:28 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-e12c26c7-150a-4837-8fb2-72db585dd2f3 20/04/26 18:19:28 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-1a4b587a-a177-4cab-8f93-6849d64386d9 Command exiting with ret '0' ``` #### Analysis ![image](https://user-images.githubusercontent.com/20113411/80422808-6b8be100-8911-11ea-9dd1-bfa6b0bafd9c.png) ### Analysis about step 2 `Rollback` cost 42mins, output `FileStatusExt: Metadata Entry doesn't exist` repeatly. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org