Hello, Is there a way to estimate the approximate size of a dataframe? I know we can cache and look at the size in UI but I'm trying to do this programatically. With RDD, I can sample and sum up size using SizeEstimator. Then extrapolate it to the entire RDD. That will give me approx size of RDD. With dataframes, its tricky due to columnar storage. How do we do it?
On a related note, I see size of RDD object to be ~60MB. Is that the footprint of RDD in driver JVM? scala> val temp = sc.parallelize(Array(1,2,3,4,5,6)) scala> SizeEstimator.estimate(temp) res13: Long = 69507320 Srikanth