Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Ajay Srivastava
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this. Regards,Ajay On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava a_k_srivast...@yahoo.com.INVALID wrote: Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile(people.parquet)

Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Reynold Xin
What query did you run? Parquet should have predicate and column pushdown, i.e. if your query only needs to read 3 columns, then only 3 will be read. On Mon, Jan 12, 2015 at 10:20 PM, Ajay Srivastava a_k_srivast...@yahoo.com.invalid wrote: Hi, I am trying to read a parquet file using - val

Creating RDD from only few columns of a Parquet file

2015-01-12 Thread Ajay Srivastava
Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile(people.parquet) There is no way to specify that I am interested in reading only some columns from disk. For example, If the parquet file has 10 columns and want to read only 3 columns from disk. We have done