Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

zhangxiongfei Fri, 17 Apr 2015 02:53:22 -0700

Hi,
I did some tests on Parquet Files with Spark SQL DataFrame API.
I generated 36 gzip compressed parquet files by Spark SQL and stored them on 
Tachyon,The size of each file is about  222M.Then read them with below code.
val tfs 
=sqlContext.parquetFile("tachyon://datanode8.bitauto.dmp:19998/apps/tachyon/adClick");
Next,I just save this DataFrame onto HDFS with below code.It will generate 36 
parquet files too,but the size of each file is about 265M
tfs.repartition(36).saveAsParquetFile("/user/zhangxf/adClick-parquet-tachyon");
My question is Why the files on HDFS has different size with those on Tachyon 
even though they come from the same original data?



Thanks
Zhang Xiongfei

Why does the HDFS parquet file generated by Spark SQL have different size with those on Tachyon?

Reply via email to