[Spark SQL] dependencies to use test helpers

2019-07-24 Thread James Pirz
I have a Scala application in which I have added some extra rules to Catalyst. While adding some unit tests, I am trying to use some existing functions from Catalyst's test code: Specifically comparePlans() and normalizePlan() under PlanTestBase

Re: Setting executors per worker - Standalone

2015-09-29 Thread James Pirz
park-graphx-in-action > > > > > > On 29 Sep 2015, at 04:47, James Pirz <james.p...@gmail.com> wrote: > > Thanks for your reply. > > Setting it as > > --conf spark.executor.cores=1 > > when I start spark-shell (as an example application) indeed sets the

Re: Setting executors per worker - Standalone

2015-09-28 Thread James Pirz
er worker since you > have 4 cores per worker > > > > On Tue, Sep 29, 2015 at 8:24 AM, James Pirz <james.p...@gmail.com> wrote: > >> Hi, >> >> I am using speak 1.5 (standalone mode) on a cluster with 10 nodes while >> each machine has 12GB of RAM and

Setting executors per worker - Standalone

2015-09-28 Thread James Pirz
Hi, I am using speak 1.5 (standalone mode) on a cluster with 10 nodes while each machine has 12GB of RAM and 4 cores. On each machine I have one worker which is running one executor that grabs all 4 cores. I am interested to check the performance with "one worker but 4 executors per machine -

Repartitioning external table in Spark sql

2015-08-18 Thread James Pirz
I am using Spark 1.4.1 , in stand-alone mode, on a cluster of 3 nodes. Using Spark sql and Hive Context, I am trying to run a simple scan query on an existing Hive table (which is an external table consisting of rows in text files stored in HDFS - it is NOT parquet, ORC or any other richer

Re: worker and executor memory

2015-08-14 Thread James Pirz
are scheduled that way, as it is a map-only job and reading can happen in parallel. On Thu, Aug 13, 2015 at 9:10 PM, James Pirz james.p...@gmail.com wrote: Hi, I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, for a workload similar to TPCH (analytical queries with multiple

worker and executor memory

2015-08-13 Thread James Pirz
Hi, I am using Spark 1.4 on a cluster (stand-alone mode), across 3 machines, for a workload similar to TPCH (analytical queries with multiple/multi-way large joins and aggregations). Each machine has 12GB of Memory and 4 cores. My total data size is 150GB, stored in HDFS (stored as Hive tables),

Re: spark-submit does not use hive-site.xml

2015-06-10 Thread James Pirz
to communicate with Hive metastore. So your program need to instantiate a `org.apache.spark.sql.hive.HiveContext` instead. Cheng On 6/10/15 10:19 AM, James Pirz wrote: I am using Spark (standalone) to run queries (from a remote client) against data in tables that are already defined/loaded

Re: Running SparkSql against Hive tables

2015-06-09 Thread James Pirz
to connect to hive, which should work even without spark. Best Ayan On Tue, Jun 9, 2015 at 10:42 AM, James Pirz james.p...@gmail.com wrote: Thanks for the help! I am actually trying Spark SQL to run queries against tables that I've defined in Hive. I follow theses steps: - I start

Re: Running SparkSql against Hive tables

2015-06-09 Thread James Pirz
a query file with -f flag). Looking at the Spark SQL documentation, it seems that it is possible. Please correct me if I am wrong. On Mon, Jun 8, 2015 at 6:56 PM, Cheng Lian lian.cs@gmail.com wrote: On 6/9/15 8:42 AM, James Pirz wrote: Thanks for the help! I am actually trying Spark SQL to run

spark-submit does not use hive-site.xml

2015-06-09 Thread James Pirz
I am using Spark (standalone) to run queries (from a remote client) against data in tables that are already defined/loaded in Hive. I have started metastore service in Hive successfully, and by putting hive-site.xml, with proper metastore.uri, in $SPARK_HOME/conf directory, I tried to share its

Re: Running SparkSql against Hive tables

2015-06-08 Thread James Pirz
that would be highly appreciated. Thnx On Sun, Jun 7, 2015 at 6:39 AM, Cheng Lian lian.cs@gmail.com wrote: On 6/6/15 9:06 AM, James Pirz wrote: I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark SQL' to run some SQL scripts, on the cluster. I realized