Re: Spark stuck at removing broadcast variable

2020-04-18 Thread Waleed Fateem
This might be obvious but just checking anyways, did you confirm whether or
not all of the messages have already been consumed by Spark? If that's the
case then I wouldn't expect much to happen unless new data comes into your
Kafka topic.

If you're a hundred percent sure that there's still plenty more data to be
consumed by Spark and that didn't happen, then I would suggest generating
Java thread dumps (use Java's jstack command) from your driver's process.

On Sat, Apr 18, 2020 at 2:43 PM Sean Owen  wrote:

> I don't think that means it's stuck on removing something; it was
> removed. Not sure what it is waiting on - more data perhaps?
>
> On Sat, Apr 18, 2020 at 2:22 PM Alchemist 
> wrote:
> >
> > I am running a simple Spark structured streaming application that is
> pulling data from a Kafka Topic. I have a Kafka Topic with nearly 1000
> partitions. I am running this app on 6 node EMR cluster with 4 cores and
> 16GB RAM. I observed that Spark is trying to pull data from all 1024 Kafka
> partition and after running successful for few iteration it is stuck with
> following exception:
> >
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 101
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 66
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 77
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 78
> >
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> in memory (size: 4.5 KB, free: 2.7 GB)
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> ip- in memory (size: 4.5 KB, free: 2.7 GB)
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> ip- in memory (size: 4.5 KB, free: 2.7 GB)
> > Then Sparks show RUNNING but it is NOT Processing any data.
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark stuck at removing broadcast variable

2020-04-18 Thread Sean Owen
I don't think that means it's stuck on removing something; it was
removed. Not sure what it is waiting on - more data perhaps?

On Sat, Apr 18, 2020 at 2:22 PM Alchemist  wrote:
>
> I am running a simple Spark structured streaming application that is pulling 
> data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I 
> am running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I 
> observed that Spark is trying to pull data from all 1024 Kafka partition and 
> after running successful for few iteration it is stuck with following 
> exception:
>
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 101
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 66
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 77
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 78
>
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on  in 
> memory (size: 4.5 KB, free: 2.7 GB)
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in 
> memory (size: 4.5 KB, free: 2.7 GB)
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in 
> memory (size: 4.5 KB, free: 2.7 GB)
> Then Sparks show RUNNING but it is NOT Processing any data.
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark stuck at removing broadcast variable

2020-04-18 Thread Alchemist
I am running a simple Spark structured streaming application that is pulling 
data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am 
running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed 
that Spark is trying to pull data from all 1024 Kafka partition and after 
running successful for few iteration it is stuck with following exception:
20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 10120/04/18 00:51:41 
INFO ContextCleaner: Cleaned accumulator 6620/04/18 00:51:41 INFO 
ContextCleaner: Cleaned accumulator 7720/04/18 00:51:41 INFO ContextCleaner: 
Cleaned accumulator 78
20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on  in 
memory (size: 4.5 KB, free: 2.7 GB)20/04/18 00:51:41 INFO BlockManagerInfo: 
Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7 
GB)20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- 
in memory (size: 4.5 KB, free: 2.7 GB)Then Sparks show RUNNING but it is NOT 
Processing any data.


Re: Spark structured streaming - performance tuning

2020-04-18 Thread Alex Ott
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any 
questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott  wrote:

 SV> http://shop.oreilly.com/product/0636920047568.do has quite good 
information
 SV> on it.  For Kafka, you need to start with approximation that 
processing of
 SV> each partition is a separate task that need to be executed, so you 
need to
 SV> plan number of cores correspondingly.
 SV>
 SV> Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>  SV> Hello, 
 SV>  SV> Can someone point me to a good video or document which takes 
about performance tuning for structured streaming app? 
 SV>  SV> I am looking especially for listening to Kafka topics say 5 
topics each with 100 portions .
 SV>  SV> Trying to figure out best cluster size and number of executors 
and cores required. 


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org