memory allocation?

Himanshu Mehra Tue, 16 Jun 2015 00:12:10 -0700

Hi Shreesh,

You can definitely decide the how many partitions your data should break
into by passing a, 'minPartition' argument in the method
sc.textFile("input/path", minPartition) and 'numSlices' arg in method
sc.parallelize(localCollection, numSlices). In fact there is always a option
to specify the number of partitions you want with your RDD in all the method
of creating a first hand RDD. 
 moreover you can change the number of partitions any point of time by
calling some of these methods on your RDD :

'coalesce(numPartitions)': Decrease the number of partitions in the
RDD to numPartitions. Useful for running operations more efficiently after
filtering down a large dataset.

'repartition(numPartitions)': Reshuffle the data in the RDD randomly
to create either more or fewer partitions and balance it across them. This
always shuffles all data over the network.

'repartitionAndSortWithinPartitions(partitioner)': Repartition the RDD
according to the given partitioner and, within each resulting partition,
sort records by their keys. This is more efficient than calling repartition
and then sorting within each partition because it can push the sorting down
into the shuffle machinery.

You can set these property to tune your spark environment :

spark.driver.cores Number of cores to use for the driver process,
only in
cluster mode.

spark.executor.cores The number of cores to use on each executor.

spark.driver.memory Amount of memory to use for the driver
process, i.e. where SparkContext is initialized.

spark.executor.memory Amount of memory to use per executor process, in
the same format as JVM memory strings

you can also set, the number of worker processes per node by initializing
"SPARK_WORKER_INSTANCES" and the number of workers to start by initializing
"SPARK_EXECUTOR_INSTANCES" in the "spark_home/conf/spark-env.sh" file.

Thanks

Himanshu

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-of-executors-cores-memory-allocation-tp23326p23330.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How does one decide no of executors/cores/memory allocation?

Reply via email to