Hey Yong,
It seems that Hadoop `FileSystem` adds the size of a block to the
metrics even if you only touch a fraction of it (reading Parquet
metadata for example). This behavior can be verified by the following
snippet:
```scala
import org.apache.spark.sql.Row
import
Hi, Currently most of the data in our production is using Avro + Snappy. I want
to test the benefits if we store the data in Parquet format. I changed the our
ETL to generate the Parquet format, instead of Avor, and want to test a simple
sql in Spark SQL, to verify the benefits from Parquet.
I