subject:"From Spark web ui, how to prove the parquet column pruning working"

Re: From Spark web ui, how to prove the parquet column pruning working

2015-03-15 Thread Cheng Lian

Hey Yong, It seems that Hadoop `FileSystem` adds the size of a block to the metrics even if you only touch a fraction of it (reading Parquet metadata for example). This behavior can be verified by the following snippet: ```scala import org.apache.spark.sql.Row import

From Spark web ui, how to prove the parquet column pruning working

2015-03-09 Thread java8964

Hi, Currently most of the data in our production is using Avro + Snappy. I want to test the benefits if we store the data in Parquet format. I changed the our ETL to generate the Parquet format, instead of Avor, and want to test a simple sql in Spark SQL, to verify the benefits from Parquet. I