SizeEstimator.estimate(df) will not give the size of dataframe rt? I think
it will give size of df object.
With RDD, I sample() and collect() and sum size of each row. If I do the
same with dataframe it will no longer be size when represented in columnar
format.
I'd also like to know how
From a quick glance of SparkStrategies.scala , when statistics.sizeInBytes
of the LogicalPlan is = autoBroadcastJoinThreshold, the plan's output
would be used in broadcast join as the 'build' relation.
FYI
On Mon, Aug 10, 2015 at 8:04 AM, Srikanth srikanth...@gmail.com wrote:
Hello,
Is there a way to estimate the approximate size of a dataframe? I know we
can cache and look at the size in UI but I'm trying to do this
programatically. With RDD, I can sample and sum up size using
SizeEstimator. Then extrapolate it to the entire RDD. That will give me
approx size of RDD.