Hi,
I was facing GC overhead errors while executing an application with 570MB
data(with rdd replication).

In order to fix the heap errors, I repartitioned the rdd to 10:

val logData = sc.textFile("hdfs:/text_data/text
data.txt").persist(StorageLevel.MEMORY_ONLY_2)
    val parts=logData.coalesce(10,true)
      println(parts.partitions.length).

But the problem is, WebUI still shows number of partitions as 5 while the
print statement outputs 10. I tried even repartition(), but face the same
problem.

Also, does webUI show the storage details of each partition twice when I
replicate the rdd? Because, I see that webUI displays each partition only
once while it says 2 x replicated.

Can someone help me out in this!!!

-Karthik

Reply via email to