Hello, can anybody kindly help me out a little bit here? I just verified the problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully able to get Streaming drivers to terminate with no issue IF I don't use Kinesis and open any Receivers.
Thank you! Roberto On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <roberto.coluc...@gmail.com > wrote: > Hi, > > I'm struggling around an issue ever since I tried to upgrade my Spark > Streaming solution from 1.4.1 to 1.5+. > > I have a Spark Streaming app which creates 3 ReceiverInputDStreams > leveraging KinesisUtils.createStream API. > > I used to leverage a timeout to terminate my app > (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf > spark.streaming.stopGracefullyOnShutdown=true). > > I used to submit my Spark app on EMR in yarn-cluster mode. > > Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9). > > Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on > emr-4.3.0) I can't get the app to actually terminate. Logs tells me it > tries to, but no confirmation of receivers stop is retrieved. Instead, when > the timer gets to the next period, the StreamingContext continues its > processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem > and pmem killls disabled). > > ... > > 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, > exitCode: 0 > 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) > from shutdown hook > 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers > 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate > gracefully > 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141 > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on > ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB) > 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184 > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB) > 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on > ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB) > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 1454448300000 > ms for checkpointing > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 1454448300000 > ms for checkpointing > 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 1454448000000 ms to > 1454448300000 ms (aligned to 1454448000000 ms and 1454448300000 ms) > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 1454448300000 > ms for checkpointing > 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 1454448300000 > ms for checkpointing > 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 1454448300000 ms > 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time > 1454448300000 ms > 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time > 1454448300000 ms > 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 1454448300000 > ms.0 from job set of time 1454448300000 ms > > ... > > > Please, this is really blocking in the upgrade process to latest Spark > versions and I really don't know how to work it around. > > Any help would be very much appreciated. > > Thank you, > > Roberto > > >