Re: Unable to broadcast a very large variable

2019-04-11 Thread V0lleyBallJunki3
I am not using pyspark. The job is written in Scala



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-11 Thread Sagar Grover
Extending Arthur's question,
I am facing the same problem(no of partitions were huge- cored 960,
partitions - 16000). I tried to decrease the number of partitions with
coalesce, but the problem is unbalanced data. After using coalesce, it
gives me Java out of heap space error. There was no out of heap error
without coalesce. I am guessing the error is due to uneven data and some
heavy partitions getting merged together.
Let me know if you have any pointers on how to handle this.

On Wed, Apr 10, 2019 at 11:21 PM yeikel valdes  wrote:

> If you need to reduce the number of partitions you could also try
> df.coalesce
>
>  On Thu, 04 Apr 2019 06:52:26 -0700 * jasonnerot...@gmail.com
>  * wrote 
>
> Have you tried something like this?
>
> spark.conf.set("spark.sql.shuffle.partitions", "5" )
>
>
>
> On Wed, Apr 3, 2019 at 8:37 PM Arthur Li  wrote:
>
> Hi Sparkers,
>
> I noticed that in my spark application, the number of tasks in the first
> stage is equal to the number of files read by the application(at least for
> Avro) if the number of cpu cores is less than the number of files. Though
> If cpu cores are more than number of files, it's usually equal to default
> parallelism number. Why is it behave like this? Would this require a lot of
> resource from the driver? Is there any way we can do to decrease the number
> of tasks(partitions) in the first stage without merge files before loading?
>
> Thanks,
> Arthur
>
>
> IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> collectively referred to as "Communication"), is intended only for the 
> addressee(s)
> named above.  This Communication may include information that is
> privileged, confidential and exempt from disclosure under applicable law.
> If the recipient of this Communication is not the intended recipient, or
> the employee or agent responsible for delivering this Communication to the
> intended recipient, you are notified that any dissemination, distribution
> or copying of this Communication is strictly prohibited.  If you have
> received this Communication in error, please notify the sender immediately
> by phone or email and permanently delete this Communication from your
> computer without making a copy. Thank you.
>
>
>
> --
> Thanks,
> Jason
>
>
>