Hello,

can anybody kindly help me out a little bit here? I just verified the
problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
able to get Streaming drivers to terminate with no issue IF I don't use
Kinesis and open any Receivers.

Thank you!

Roberto


On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <roberto.coluc...@gmail.com
> wrote:

> Hi,
>
> I'm struggling around an issue ever since I tried to upgrade my Spark
> Streaming solution from 1.4.1 to 1.5+.
>
> I have a Spark Streaming app which creates 3 ReceiverInputDStreams
> leveraging KinesisUtils.createStream API.
>
> I used to leverage a timeout to terminate my app
> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
> spark.streaming.stopGracefullyOnShutdown=true).
>
> I used to submit my Spark app on EMR in yarn-cluster mode.
>
> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).
>
> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on
> emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
> tries to, but no confirmation of receivers stop is retrieved. Instead, when
> the timer gets to the next period, the StreamingContext continues its
> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
> and pmem killls disabled).
>
> ...
>
> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
> exitCode: 0
> 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) 
> from shutdown hook
> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to terminate 
> gracefully
> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB)
> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB)
> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB)
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 1454448300000 
> ms for checkpointing
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 1454448300000 
> ms for checkpointing
> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 1454448000000 ms to 
> 1454448300000 ms (aligned to 1454448000000 ms and 1454448300000 ms)
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 1454448300000 
> ms for checkpointing
> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 1454448300000 
> ms for checkpointing
> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 1454448300000 ms
> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 
> 1454448300000 ms
> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 
> 1454448300000 ms
> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 1454448300000 
> ms.0 from job set of time 1454448300000 ms
>
> ...
>
>
> Please, this is really blocking in the upgrade process to latest Spark
> versions and I really don't know how to work it around.
>
> Any help would be very much appreciated.
>
> Thank you,
>
> Roberto
>
>
>

Reply via email to