Hi Bala,

It depends on how your Hive table is configured. If you used partitioning and 
you are filtering on a partition column then it will only load the relevant 
partitions. If, however, you’re filtering on a non-partitioned column then it 
will have to read all the data and then filter as part of the Spark job.

Thanks,
Silvio

From: "Balaraju.Kagidala Kagidala" 
<balaraju.kagid...@gmail.com<mailto:balaraju.kagid...@gmail.com>>
Date: Thursday, January 21, 2016 at 9:37 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: General Question (Spark Hive integration )

Hi ,


  I have simple question regarding Spark Hive integration with DataFrames.

When we query  for a table, does spark loads whole table into memory and 
applies the filter on top of it or it only loads the data with filter applied.

for example if the my query 'select * from employee where deptno=10' does my 
rdd loads whole employee data into memory and applies fileter or will it load 
only dept number 10 data.


Thanks
Bala





Reply via email to