Hi,
I am currently using spark 1.5.2 and I have been able to run benchmarks in
spark (SQL specifically) in single user mode. For benchmarking with
multiple users, I have tried some of the following approaches, but each has
its own disadvantage
1. Start thrift server in Spark.
- Execute
1. In first case (i.e in cluster where you have hive and spark), it would
have executed via HiveTableScan instead of OrcRelation. HiveTableScan would
not propagate any PPD related information to ORC readers (SPARK-12998). PPD
might not play a big role here as your where conditions seem to be only