liujinhui1994 opened a new issue #3280: URL: https://github.com/apache/hudi/issues/3280
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : latest master * Spark version : 2.4.5 * Hive version : not involving * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : obs * Running on Docker? (yes/no) : no **Additional context** Use structedstreaming to consume kafka to write to hudi. When using other versions of hudi, such as 0.7, 0.8, the program can run continuously and stably. 1. When using the latest master version, I found that writing hudi to an empty folder, the program can run for one batch, but the second batch will fail. 2. Next, start the yarn task and try again 3. The program can continue to consume one batch, but the second batch will still fail. 4. I have written more than 3000 tables in production using the lower version of hudi (0.8 etc.), which proves that the program is no problem. Hudi may have updated some of the causes recently. I tried to locate the problem, but did not find the problem **Stacktrace** `2021-07-15 18:03:30,437 | INFO | [block-manager-slave-async-thread-pool-39] | Removing RDD 31 | org.apache.spark.storage.BlockManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,445 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Removing RDD 21 from persistence list | org.apache.spark.rdd.MapPartitionsRDD.logInfo(Logging.scala:54) 2021-07-15 18:03:30,446 | INFO | [block-manager-slave-async-thread-pool-42] | Removing RDD 21 | org.apache.spark.storage.BlockManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,460 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum | org.apache.hadoop.conf.Configuration.deprecation.logDeprecation(Configuration.java:1409) 2021-07-15 18:03:30,482 | WARN | [MetricsSender] | Unable to load JDK7 types (java.nio.file.Path): no Java7 type support added | mrs.shaded.provider.com.fasterxml.jackson.databind.ext.Java7Handlers.<clinit>(Java7Handlers.java:29) 2021-07-15 18:03:30,497 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Writing atomically to obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/commits/2 using temp file obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/commits/.2.5817bc89-610b-43d0-be15-c16daffbda17.tmp | org.apache.spark.sql.execution.streaming.CheckpointFileManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,513 | INFO | [block-manager-slave-async-thread-pool-48] | Removing RDD 31 | org.apache.spark.storage.BlockManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,602 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Renamed temp file obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/commits/.2.5817bc89-610b-43d0-be15-c16daffbda17.tmp to obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/commits/2 | org.apache.spark.sql.execution.streaming.CheckpointFileManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,626 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Streaming query made progress: { "id" : "6c7bb949-bcb6-494a-a576-e64b05643c31", "runId" : "d6c5cb5d-7e6c-45b7-935e-0308327a0bab", "name" : null, "timestamp" : "2021-07-15T10:03:00.705Z", "batchId" : 2, "numInputRows" : 58847, "processedRowsPerSecond" : 1968.2587464044418, "durationMs" : { "addBatch" : 28029, "getBatch" : 7, "getEndOffset" : 0, "queryPlanning" : 302, "setOffsetRange" : 646, "triggerExecution" : 29898, "walCommit" : 511 }, "stateOperators" : [ ], "sources" : [ { "description" : "KafkaV2[Subscribe[t3_ts_android_device]]", "startOffset" : { "t3_ts_android_device" : { "2" : 524155, "5" : 524155, "4" : 524156, "7" : 524155, "1" : 524155, "3" : 524154, "6" : 524155, "0" : 524154 } }, "endOffset" : { "t3_ts_android_device" : { "2" : 531507, "5" : 531508, "4" : 531509, "7" : 531507, "1" : 531509, "3" : 531506, "6" : 531509, "0" : 531506 } }, "numInputRows" : 58847, "processedRowsPerSecond" : 1968.2587464044418 } ], "sink" : { "description" : "ForeachBatchSink" } } | org.apache.spark.sql.execution.streaming.MicroBatchExecution.logInfo(Logging.scala:54) 2021-07-15 18:03:30,627 | WARN | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Current batch is falling behind. The trigger interval is 10000 milliseconds, but spent 29922 milliseconds | org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.logWarning(Logging.scala:66) 2021-07-15 18:03:30,630 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-4 to offset 531556. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,630 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-3 to offset 531554. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,630 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-5 to offset 531556. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,631 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-2 to offset 531555. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,631 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-6 to offset 531556. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,631 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-0 to offset 531554. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,633 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-7 to offset 531555. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,633 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | [Consumer clientId=consumer-1, groupId=spark-kafka-source-6c362da9-8740-4aab-92fc-5eb97a666e02-679833050-driver-0] Resetting offset for partition t3_ts_android_device-1 to offset 531556. | org.apache.kafka.clients.consumer.internals.Fetcher.resetOffsetIfNeeded(Fetcher.java:583) 2021-07-15 18:03:30,678 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Writing atomically to obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/offsets/3 using temp file obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/offsets/.3.9893a7ac-d49f-4cff-a2db-e7f4d1dfd266.tmp | org.apache.spark.sql.execution.streaming.CheckpointFileManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,824 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Renamed temp file obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/offsets/.3.9893a7ac-d49f-4cff-a2db-e7f4d1dfd266.tmp to obs://t3-data-lake-std-prod/datalake/checkpoint/dls/kafka/t3_ts_android_device/day/offsets/3 | org.apache.spark.sql.execution.streaming.CheckpointFileManager.logInfo(Logging.scala:54) 2021-07-15 18:03:30,824 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Committed offsets for batch 3. Metadata OffsetSeqMetadata(0,1626343410634,Map(spark.sql.streaming.stateStore.providerClass -> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider, spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion -> 2, spark.sql.streaming.multipleWatermarkPolicy -> min, spark.sql.streaming.aggregation.stateFormatVersion -> 2, spark.sql.shuffle.partitions -> 200)) | org.apache.spark.sql.execution.streaming.MicroBatchExecution.logInfo(Logging.scala:54) 2021-07-15 18:03:30,883 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | The key carbon.enable.mv with value true added in the session param | org.apache.carbondata.core.util.SessionParams.addProperty(SessionParams.java:101) 2021-07-15 18:03:30,885 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Trying to connect to metastore with URI thrift://node-master1AxuE.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:9083 | org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:582) 2021-07-15 18:03:30,886 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Opened a connection to metastore, current connections: 1 | org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:671) 2021-07-15 18:03:30,893 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Connected to metastore. | org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:765) 2021-07-15 18:03:30,893 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=skylake (auth:SIMPLE) retries=1 delay=1 lifetime=0 | org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:99) 2021-07-15 18:03:32,870 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | The key carbon.enable.mv with value true added in the session param | org.apache.carbondata.core.util.SessionParams.addProperty(SessionParams.java:101) 2021-07-15 18:03:34,651 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Partitions added: Map() | org.apache.spark.sql.kafka010.KafkaMicroBatchReader.logInfo(Logging.scala:54) 2021-07-15 18:03:34,676 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | The key carbon.enable.mv with value true added in the session param | org.apache.carbondata.core.util.SessionParams.addProperty(SessionParams.java:101) 2021-07-15 18:03:36,631 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Starting job: getCallSite at StreamExecution.scala:173 | org.apache.spark.SparkContext.logInfo(Logging.scala:54) 2021-07-15 18:03:36,632 | INFO | [dag-scheduler-event-loop] | Registering RDD 45 (getCallSite at StreamExecution.scala:173) as input to shuffle 3 | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,633 | INFO | [dag-scheduler-event-loop] | Got job 8 (getCallSite at StreamExecution.scala:173) with 1 output partitions | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,633 | INFO | [dag-scheduler-event-loop] | Final stage: ResultStage 13 (getCallSite at StreamExecution.scala:173) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,633 | INFO | [dag-scheduler-event-loop] | Parents of final stage: List(ShuffleMapStage 12) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,633 | INFO | [dag-scheduler-event-loop] | Missing parents: List(ShuffleMapStage 12) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,634 | INFO | [dag-scheduler-event-loop] | Submitting ShuffleMapStage 12 (MapPartitionsRDD[45] at getCallSite at StreamExecution.scala:173), which has no missing parents | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,638 | INFO | [dag-scheduler-event-loop] | Block broadcast_11 stored as values in memory (estimated size 39.5 KB, free 94.5 MB) | org.apache.spark.storage.memory.MemoryStore.logInfo(Logging.scala:54) 2021-07-15 18:03:36,641 | INFO | [dag-scheduler-event-loop] | Block broadcast_11_piece0 stored as bytes in memory (estimated size 18.7 KB, free 94.4 MB) | org.apache.spark.storage.memory.MemoryStore.logInfo(Logging.scala:54) 2021-07-15 18:03:36,641 | INFO | [dispatcher-event-loop-1] | Added broadcast_11_piece0 in memory on node-group-1rXLd.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22685 (size: 18.7 KB, free: 94.5 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,642 | INFO | [dag-scheduler-event-loop] | Created broadcast 11 from broadcast at DAGScheduler.scala:1232 | org.apache.spark.SparkContext.logInfo(Logging.scala:54) 2021-07-15 18:03:36,642 | INFO | [dag-scheduler-event-loop] | Submitting 8 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[45] at getCallSite at StreamExecution.scala:173) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7)) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,642 | INFO | [dag-scheduler-event-loop] | Adding task set 12.0 with 8 tasks | org.apache.spark.scheduler.cluster.YarnClusterScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,643 | INFO | [dispatcher-event-loop-0] | Starting task 1.0 in stage 12.0 (TID 39, node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 4, partition 1, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,644 | INFO | [dispatcher-event-loop-0] | Starting task 0.0 in stage 12.0 (TID 40, node-group-1ViRs.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 1, partition 0, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,644 | INFO | [dispatcher-event-loop-0] | Starting task 6.0 in stage 12.0 (TID 41, node-group-198O4.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 2, partition 6, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,644 | INFO | [dispatcher-event-loop-0] | Starting task 2.0 in stage 12.0 (TID 42, node-group-1jPXG.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 3, partition 2, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,644 | INFO | [dispatcher-event-loop-0] | Starting task 3.0 in stage 12.0 (TID 43, node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 4, partition 3, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,644 | INFO | [dispatcher-event-loop-0] | Starting task 4.0 in stage 12.0 (TID 44, node-group-1ViRs.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 1, partition 4, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,645 | INFO | [dispatcher-event-loop-0] | Starting task 7.0 in stage 12.0 (TID 45, node-group-198O4.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 2, partition 7, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,645 | INFO | [dispatcher-event-loop-0] | Starting task 5.0 in stage 12.0 (TID 46, node-group-1jPXG.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 3, partition 5, PROCESS_LOCAL, 8871 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,653 | INFO | [dispatcher-event-loop-0] | Added broadcast_11_piece0 in memory on node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22816 (size: 18.7 KB, free: 2004.6 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,654 | INFO | [dispatcher-event-loop-1] | Added broadcast_11_piece0 in memory on node-group-198O4.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22613 (size: 18.7 KB, free: 2004.6 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,655 | INFO | [dispatcher-event-loop-0] | Added broadcast_11_piece0 in memory on node-group-1jPXG.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22740 (size: 18.7 KB, free: 2004.6 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,669 | INFO | [task-result-getter-2] | Finished task 6.0 in stage 12.0 (TID 41) in 25 ms on node-group-198O4.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 2) (1/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,669 | INFO | [task-result-getter-0] | Finished task 3.0 in stage 12.0 (TID 43) in 25 ms on node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 4) (2/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,670 | INFO | [task-result-getter-1] | Finished task 5.0 in stage 12.0 (TID 46) in 25 ms on node-group-1jPXG.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 3) (3/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,670 | INFO | [task-result-getter-3] | Finished task 7.0 in stage 12.0 (TID 45) in 25 ms on node-group-198O4.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 2) (4/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,670 | INFO | [task-result-getter-2] | Finished task 1.0 in stage 12.0 (TID 39) in 27 ms on node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 4) (5/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,670 | INFO | [task-result-getter-0] | Finished task 2.0 in stage 12.0 (TID 42) in 26 ms on node-group-1jPXG.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 3) (6/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,678 | INFO | [dispatcher-event-loop-1] | Added broadcast_11_piece0 in memory on node-group-1ViRs.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22798 (size: 18.7 KB, free: 2004.6 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,694 | INFO | [task-result-getter-1] | Finished task 0.0 in stage 12.0 (TID 40) in 50 ms on node-group-1ViRs.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 1) (7/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,694 | INFO | [task-result-getter-3] | Finished task 4.0 in stage 12.0 (TID 44) in 50 ms on node-group-1ViRs.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 1) (8/8) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,694 | INFO | [task-result-getter-3] | Removed TaskSet 12.0, whose tasks have all completed, from pool | org.apache.spark.scheduler.cluster.YarnClusterScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | ShuffleMapStage 12 (getCallSite at StreamExecution.scala:173) finished in 0.059 s | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | looking for newly runnable stages | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | running: Set() | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | waiting: Set(ResultStage 13) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | failed: Set() | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,695 | INFO | [dag-scheduler-event-loop] | Submitting ResultStage 13 (MapPartitionsRDD[48] at getCallSite at StreamExecution.scala:173), which has no missing parents | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,696 | INFO | [dag-scheduler-event-loop] | Block broadcast_12 stored as values in memory (estimated size 10.5 KB, free 94.4 MB) | org.apache.spark.storage.memory.MemoryStore.logInfo(Logging.scala:54) 2021-07-15 18:03:36,706 | INFO | [dag-scheduler-event-loop] | Block broadcast_12_piece0 stored as bytes in memory (estimated size 4.9 KB, free 94.4 MB) | org.apache.spark.storage.memory.MemoryStore.logInfo(Logging.scala:54) 2021-07-15 18:03:36,707 | INFO | [dispatcher-event-loop-1] | Added broadcast_12_piece0 in memory on node-group-1rXLd.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22685 (size: 4.9 KB, free: 94.5 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,707 | INFO | [dag-scheduler-event-loop] | Created broadcast 12 from broadcast at DAGScheduler.scala:1232 | org.apache.spark.SparkContext.logInfo(Logging.scala:54) 2021-07-15 18:03:36,708 | INFO | [dag-scheduler-event-loop] | Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[48] at getCallSite at StreamExecution.scala:173) (first 15 tasks are for partitions Vector(0)) | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,708 | INFO | [dag-scheduler-event-loop] | Adding task set 13.0 with 1 tasks | org.apache.spark.scheduler.cluster.YarnClusterScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,709 | INFO | [dispatcher-event-loop-0] | Starting task 0.0 in stage 13.0 (TID 47, node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com, executor 4, partition 0, NODE_LOCAL, 7833 bytes) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,717 | INFO | [dispatcher-event-loop-1] | Added broadcast_12_piece0 in memory on node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22816 (size: 4.9 KB, free: 2004.6 MB) | org.apache.spark.storage.BlockManagerInfo.logInfo(Logging.scala:54) 2021-07-15 18:03:36,725 | INFO | [dispatcher-event-loop-0] | Asked to send map output locations for shuffle 3 to *.*.42.204:50614 | org.apache.spark.MapOutputTrackerMasterEndpoint.logInfo(Logging.scala:54) 2021-07-15 18:03:36,748 | INFO | [task-result-getter-2] | Finished task 0.0 in stage 13.0 (TID 47) in 40 ms on node-group-1eKyO.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com (executor 4) (1/1) | org.apache.spark.scheduler.TaskSetManager.logInfo(Logging.scala:54) 2021-07-15 18:03:36,748 | INFO | [task-result-getter-2] | Removed TaskSet 13.0, whose tasks have all completed, from pool | org.apache.spark.scheduler.cluster.YarnClusterScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,748 | INFO | [dag-scheduler-event-loop] | ResultStage 13 (getCallSite at StreamExecution.scala:173) finished in 0.052 s | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,749 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Job 8 finished: getCallSite at StreamExecution.scala:173, took 0.117909 s | org.apache.spark.scheduler.DAGScheduler.logInfo(Logging.scala:54) 2021-07-15 18:03:36,750 | INFO | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | batchID => 3 | cn.t3go.bigdata.mains.IncrementBizKafka.call(IncrementBizKafka.java:99) 2021-07-15 18:03:36,841 | ERROR | [stream execution thread for [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab]] | Query [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab] terminated with error | org.apache.spark.sql.execution.streaming.MicroBatchExecution.logError(Logging.scala:91) java.lang.ClassNotFoundException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:265) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at cn.t3go.bigdata.mains.IncrementBizKafka.sinkHudi(IncrementBizKafka.java:159) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:104) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:91) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$19.apply(MicroBatchExecution.scala:548) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:546) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:681) ... 28 more 2021-07-15 18:03:36,868 | ERROR | [Driver] | User class threw exception: org.apache.spark.sql.streaming.StreamingQueryException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html === Streaming Query === Identifier: [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab] Current Committed Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531507,"5":531508,"4":531509,"7":531507,"1":531509,"3":531506,"6":531509,"0":531506}}} Current Available Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531555,"5":531556,"4":531556,"7":531555,"1":531556,"3":531554,"6":531556,"0":531554}}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: KafkaV2[Subscribe[t3_ts_android_device]] | org.apache.spark.deploy.yarn.ApplicationMaster.logError(Logging.scala:91) org.apache.spark.sql.streaming.StreamingQueryException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html === Streaming Query === Identifier: [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab] Current Committed Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531507,"5":531508,"4":531509,"7":531507,"1":531509,"3":531506,"6":531509,"0":531506}}} Current Available Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531555,"5":531556,"4":531556,"7":531555,"1":531556,"3":531554,"6":531556,"0":531554}}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: KafkaV2[Subscribe[t3_ts_android_device]] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:265) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at cn.t3go.bigdata.mains.IncrementBizKafka.sinkHudi(IncrementBizKafka.java:159) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:104) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:91) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$19.apply(MicroBatchExecution.scala:548) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:546) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) ... 1 more Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:681) ... 28 more 2021-07-15 18:03:36,870 | INFO | [Driver] | Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.streaming.StreamingQueryException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html === Streaming Query === Identifier: [id = 6c7bb949-bcb6-494a-a576-e64b05643c31, runId = d6c5cb5d-7e6c-45b7-935e-0308327a0bab] Current Committed Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531507,"5":531508,"4":531509,"7":531507,"1":531509,"3":531506,"6":531509,"0":531506}}} Current Available Offsets: {KafkaV2[Subscribe[t3_ts_android_device]]: {"t3_ts_android_device":{"2":531555,"5":531556,"4":531556,"7":531555,"1":531556,"3":531554,"6":531556,"0":531554}}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: KafkaV2[Subscribe[t3_ts_android_device]] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) Caused by: java.lang.ClassNotFoundException: Failed to find data source: org.apache.hudi. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:704) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:265) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at cn.t3go.bigdata.mains.IncrementBizKafka.sinkHudi(IncrementBizKafka.java:159) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:104) at cn.t3go.bigdata.mains.IncrementBizKafka$1.call(IncrementBizKafka.java:91) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.streaming.DataStreamWriter$$anonfun$foreachBatch$1.apply(DataStreamWriter.scala:391) at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$19.apply(MicroBatchExecution.scala:548) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:546) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) ... 1 more Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$12.apply(DataSource.scala:681) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:681) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:681) ... 28 more ) | org.apache.spark.deploy.yarn.ApplicationMaster.logInfo(Logging.scala:54) 2021-07-15 18:03:36,881 | INFO | [shutdown-hook-0] | Invoking stop() from shutdown hook | org.apache.spark.SparkContext.logInfo(Logging.scala:54) 2021-07-15 18:03:36,895 | INFO | [shutdown-hook-0] | Stopped Spark@78afb29{HTTP/1.1, (http/1.1)}{10.4.45.248:22618} | org.spark_project.jetty.server.AbstractConnector.doStop(AbstractConnector.java:381) 2021-07-15 18:03:36,895 | INFO | [shutdown-hook-0] | node0 Stopped scavenging | org.spark_project.jetty.server.session.stopScavenging(HouseKeeper.java:149) 2021-07-15 18:03:36,896 | INFO | [shutdown-hook-0] | Stopped Spark web UI at http://node-group-1rXLd.1aac75b1-3981-40f3-b8f5-e41e74b4b9ba.com:22618 | org.apache.spark.ui.SparkUI.logInfo(Logging.scala:54) 2021-07-15 18:03:36,901 | INFO | [dispatcher-event-loop-0] | Driver requested a total number of 0 executor(s). | org.apache.spark.deploy.yarn.YarnAllocator.logInfo(Logging.scala:54) 2021-07-15 18:03:36,903 | INFO | [shutdown-hook-0] | Shutting down all executors | org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.logInfo(Logging.scala:54) 2021-07-15 18:03:36,904 | INFO | [dispatcher-event-loop-1] | Asking each executor to shut down | org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.logInfo(Logging.scala:54) 2021-07-15 18:03:36,907 | INFO | [shutdown-hook-0] | Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) | org.apache.spark.scheduler.cluster.SchedulerExtensionServices.logInfo(Logging.scala:54) 2021-07-15 18:03:36,940 | INFO | [dispatcher-event-loop-0] | MapOutputTrackerMasterEndpoint stopped! | org.apache.spark.MapOutputTrackerMasterEndpoint.logInfo(Logging.scala:54)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org