Liyin Tang created SPARK-16244: ---------------------------------- Summary: Failed job/stage couldn't stop JobGenerator immediately. Key: SPARK-16244 URL: https://issues.apache.org/jira/browse/SPARK-16244 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.5.2 Reporter: Liyin Tang
This streaming job has a very simple DAG. Each batch have only 1 job, and each job has only 1 stage. Based on the following logs, we observed a potential race condition. Stage 1 failed due to some tasks failure, and it tigers failJobAndIndependentStages. In the meanwhile, the next stage (job), 2, is submitted and was able to successfully run a few tasks before stopping JobGenerator via shutdown hook. Since the next job was able to run through a few tasks successfully, it just messed up all the checkpoints / offset management. I will attach the log in the jira as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org