Re: Spark input size when filtering on parquet files

2016-06-01 Thread Dennis Hunziker
'm afraid this metric does not reflect actual read #bytes for > parquet. > If you get the metric, you need to use other tools such as iostat or > something. > > // maropu > > > // maropu > > > On Fri, May 27, 2016 at 5:45 AM, Dennis Hunziker < > dennis.hunz

Spark input size when filtering on parquet files

2016-05-26 Thread Dennis Hunziker
Hi all I was looking into Spark 1.6.1 (Parquet 1.7.0, Hive 1.2.1) in order to find out about the improvements made in filtering/scanning parquet files when querying for tables using SparkSQL and how these changes relate to the new filter API introduced in Parquet 1.7.0. After checking the