Hi:
I want to get the total bytes of a DataFrame by following function , but when I insert the DataFrame into hive , I found the value of the function is different from spark.sql.statistics.totalSize . The spark.sql.statistics.totalSize is less than the result of following function getRDDBytes . def getRDDBytes(df:DataFrame):Long={ df.rdd.getNumPartitions match { case 0 => 0 case numPartitions => val rddOfDataframe = df.rdd.map(_.toString().getBytes("UTF-8").length.toLong) val size = if (rddOfDataframe.isEmpty()) { 0 } else { rddOfDataframe.reduce(_ + _) } size } } Appreciate if you can provide your suggestion. Best Regards Kelly Zhang