-spark-user-list.1001560.n3.nabble.com/Parallelize-on-spark-context-tp18327p18381.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hi,
JavaRDDInteger distData = sc.parallelize(data);
On what basis parallelize splits the data into multiple datasets. How to handle
if we want these many datasets to be executed per executor?
For example, my data is of 1000 integers list and I am having 2 node yarn
cluster. It is diving into
@spark.apache.org
Subject: Re: Parallelize on spark context
Hi Naveen,
So by default when we call parallelize it will be parallelized by the default
number (which we can control with the property spark.default.parallelism) or if
we just want a specific instance of parallelize to have a different