from:"Rajesh Balamohan"

Benchmarking with multiple users in Spark

2015-12-15 Thread Rajesh Balamohan

Hi, I am currently using spark 1.5.2 and I have been able to run benchmarks in spark (SQL specifically) in single user mode. For benchmarking with multiple users, I have tried some of the following approaches, but each has its own disadvantage 1. Start thrift server in Spark. - Execute

Re: orc vs parquet aggregation, orc is really slow

2016-04-17 Thread Rajesh Balamohan

1. In first case (i.e in cluster where you have hive and spark), it would have executed via HiveTableScan instead of OrcRelation. HiveTableScan would not propagate any PPD related information to ORC readers (SPARK-12998). PPD might not play a big role here as your where conditions seem to be only