Spark SQL Standalone mode missing parquet?

2015-05-05 Thread Manu Mukerji
Hi All, When I try and run Spark SQL in standalone mode it appears to be missing the parquet jar, I have to pass it as -jars and that works.. sbin/start-thriftserver.sh --jars lib/parquet-hive-bundle-1.6.0.jar --driver-memory 28g --master local[10] Any ideas on why? I downloaded the one pre

Recommendations for performance

2014-09-08 Thread Manu Mukerji
Hi, Let me start with, I am new to spark.(be gentle) I have a large data set in Parquet (~1.5B rows, 900 columns) Currently Impala takes ~1-2 seconds for the queries while SparkSQL is taking ~30 seconds.. Here is what I am currently doing.. I launch with SPARK_MEM=6g spark-shell val

Re: Querying a parquet file in s3 with an ec2 install

2014-09-08 Thread Manu Mukerji
How big is the data set? Does it work when you copy it to hdfs? -Manu On Mon, Sep 8, 2014 at 2:58 PM, Jim Carroll jimfcarr...@gmail.com wrote: Hello all, I've been wrestling with this problem all day and any suggestions would be greatly appreciated. I'm trying to test reading a parquet