When you use sql (or API from SchemaRDD/DataFrame) to read data form parquet, 
the optimizer will do column pruning, predictor pushdown, etc. Thus you can  
the benefit of parquet column benefits. After that, you can operate the 
SchemaRDD (DF) like regular RDD.

Thanks.

Zhan Zhang
 
On Feb 26, 2015, at 1:50 PM, tridib <tridib.sama...@live.com> wrote:

> Hello Experts,
> In one of my projects we are having parquet files and we are using spark SQL
> to get our analytics. I am encountering situation where simple SQL is not
> getting me what I need or the complex SQL is not supported by Spark Sql. In
> scenarios like this I am able to get things done using low level spark
> constructs like MapFunction and reducers.
> 
> My question is if I create a JavaSchemaRdd on Parquet and use basic spark
> constructs, will I still get the benefit of parquets columnar format? Will
> my aggregation be as fast as it would have been if I have used SQL?
> 
> Please advice.
> 
> Thanks & Regards
> Tridib
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to