I noticed a behaviour where it was observed that, if i'm using 
val temp = sc.parallelize ( 1 to 100000)

temp.collect

Task size will be in bytes let's say ( 1120 bytes).

But if i change this to a for loop 

import scala.collection.mutable.ArrayBuffer
val data= new ArrayBuffer[Integer]()
for(i <- 1 to 1000000)data+=i
val distData = sc.parallelize(data)
distData.collect

Here the task size is in MB's 5000120 bytes.

Any inputs here would be appreciated, this is really confusing!!!!

1) Why does the data travel from Driver to Executor every time an Action is
performed ( i thought the data exists in the Executor's memory, and only the
code is pushed from driver to executor ) ??

2) Why does Range not increase the task size, where as any other collection
increases the size exponentially ??





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to