Just to elaborate more on what Silvio wrote below, check whether you are 
referencing a class or object member variable in a function literal/closure 
passed to one of the RDD methods.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com]
Sent: Wednesday, March 2, 2016 8:43 PM
To: Bijuna; user
Subject: RE: Stage contains task of large size




One source of this could be more than you intended (or realized) getting 
serialized as part of your operations. What are the transformations you're 
using? Are you referencing local instance variables in your driver app, as part 
of your transformations? You may have a large collection for instance which 
you're using in your transformation that will get serialized and sent to each 
executor. If you do have something like that look to use broadcast variables 
instead.





From: Bijuna<mailto:bij...@gmail.com>
Sent: Wednesday, March 2, 2016 11:20 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Stage contains task of large size


Spark users,

We are running spark application in standalone mode. We see warn messages in 
the logs which says

Stage 46 contains a task of very large size (983 KB) . The maximum recommended 
task size is 100 KB.

What is the recommended approach to fix this warning. Please let me know.

Thank you
Bijuna

Sent from my iPad
---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Reply via email to