Hello,

Is there a way to estimate the approximate size of a dataframe? I know we
can cache and look at the size in UI but I'm trying to do this
programatically. With RDD, I can sample and sum up size using
SizeEstimator. Then extrapolate it to the entire RDD. That will give me
approx size of RDD. With dataframes, its tricky due to columnar storage.
How do we do it?

On a related note, I see size of RDD object to be ~60MB. Is that the
footprint of RDD in driver JVM?

scala> val temp = sc.parallelize(Array(1,2,3,4,5,6))
scala> SizeEstimator.estimate(temp)
res13: Long = 69507320

Srikanth

Reply via email to