Reading spark-env.sh from configured directory

2017-08-02 Thread Lior Chaga
Hi, I have multiple spark deployments using mesos. I use spark.executor.uri to fetch the spark distribution to executor node. Every time I upgrade spark, I download the default distribution, and just add to it custom spark-env.sh to spark/conf folder. Further more, any change I want to do in

spark on mesos cluster - metrics with graphite sink

2016-06-09 Thread Lior Chaga
Hi, I'm launching spark application on mesos cluster. The namespace of the metric includes the framework id for driver metrics, and both framework id and executor id for executor metrics. These ids are obviously assigned by mesos, and they are not permanent - re-registering the application would

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-03 Thread Lior Chaga
n Yang <yy201...@gmail.com> wrote: >> >>> The default value for spark.shuffle.reduceLocality.enabled is true. >>> >>> To reduce surprise to users of 1.5 and earlier releases, should the >>> default value be set to false ? >>> >>> On Mon, Feb

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Lior Chaga
nd see > the distribution change... fully reproducible > > On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote: > >> Hi, >> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. >> The data is not evenly distributed across exe

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Lior Chaga
Hi, I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. The data is not evenly distributed across executors, but in my case it also reproduced with legacy mode. Also tried 1.6.1 rc-1, with same results. Still looking for resolution. Lior On Fri, Feb 19, 2016 at 2:01 AM,

deleting application files in standalone cluster

2015-08-09 Thread Lior Chaga
Hi, Using spark 1.4.0 in standalone mode, with following configuration: SPARK_WORKER_OPTS=-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400 cleanup interval is set to default. Application files are not deleted. Using JavaSparkContext, and when the application ends it

Use rank with distribute by in HiveContext

2015-07-16 Thread Lior Chaga
Does spark HiveContext support the rank() ... distribute by syntax (as in the following article- http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive )? If not, how can it be achieved? Thanks, Lior

Re: spark sql - group by constant column

2015-07-15 Thread Lior Chaga
at 10:09 AM, Lior Chaga lio...@taboola.com wrote: Hi, Facing a bug with group by in SparkSQL (version 1.4). Registered a JavaRDD with object containing integer fields as a table. Then I'm trying to do a group by, with a constant value in the group by fields: SELECT primary_one

spark sql - group by constant column

2015-07-15 Thread Lior Chaga
Hi, Facing a bug with group by in SparkSQL (version 1.4). Registered a JavaRDD with object containing integer fields as a table. Then I'm trying to do a group by, with a constant value in the group by fields: SELECT primary_one, primary_two, 10 as num, SUM(measure) as total_measures FROM tbl

Re: Help optimising Spark SQL query

2015-06-22 Thread Lior Chaga
Hi James, There are a few configurations that you can try: https://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options From my experience, the codegen really boost things up. Just run sqlContext.sql(spark.sql.codegen=true) before you execute your query. But keep

Re: spark sql hive-shims

2015-05-14 Thread Lior Chaga
I see that the pre-built distributions includes hive-shims-0.23 shaded in spark-assembly jar (unlike when I make the distribution myself). Does anyone knows what I should do to include the shims in my distribution? On Thu, May 14, 2015 at 9:52 AM, Lior Chaga lio...@taboola.com wrote

Re: spark sql hive-shims

2015-05-14 Thread Lior Chaga
After profiling with YourKit, I see there's an OutOfMemoryException in context SQLContext.applySchema. Again, it's a very small RDD. Each executor has 180GB RAM. On Thu, May 14, 2015 at 8:53 AM, Lior Chaga lio...@taboola.com wrote: Hi, Using spark sql with HiveContext. Spark version is 1.3.1

Re: spark sql hive-shims

2015-05-14 Thread Lior Chaga
Ultimately it was PermGen out of memory. I somehow missed it in the log On Thu, May 14, 2015 at 9:24 AM, Lior Chaga lio...@taboola.com wrote: After profiling with YourKit, I see there's an OutOfMemoryException in context SQLContext.applySchema. Again, it's a very small RDD. Each executor has

spark sql hive-shims

2015-05-13 Thread Lior Chaga
Hi, Using spark sql with HiveContext. Spark version is 1.3.1 When running local spark everything works fine. When running on spark cluster I get ClassNotFoundError org.apache.hadoop.hive.shims.Hadoop23Shims. This class belongs to hive-shims-0.23, and is a runtime dependency for spark-hive:

mapping JavaRDD to jdbc DataFrame

2015-05-04 Thread Lior Chaga
Hi, I'd like to use a JavaRDD containing parameters for an SQL query, and use SparkSQL jdbc to load data from mySQL. Consider the following pseudo code: JavaRDDString namesRdd = ... ; ... options.put(url, jdbc:mysql://mysql?user=usr); options.put(password, pass); options.put(dbtable, (SELECT *

Date class not supported by SparkSQL

2015-04-19 Thread Lior Chaga
Using Spark 1.2.0. Tried to apply register an RDD and got: scala.MatchError: class java.util.Date (of class java.lang.Class) I see it was resolved in https://issues.apache.org/jira/browse/SPARK-2562 (included in 1.2.0) Anyone encountered this issue? Thanks, Lior

Re: Date class not supported by SparkSQL

2015-04-19 Thread Lior Chaga
; } public void setValue(Long value) { this.value = value; } } } On Sun, Apr 19, 2015 at 4:27 PM, Lior Chaga lio...@taboola.com wrote: Using Spark 1.2.0. Tried to apply register an RDD and got: scala.MatchError: class java.util.Date (of class java.lang.Class) I see

using log4j2 with spark

2015-03-04 Thread Lior Chaga
Hi, Trying to run spark 1.2.1 w/ hadoop 1.0.4 on cluster and configure it to run with log4j2. Problem is that spark-assembly.jar contains log4j and slf4j classes compatible with log4j 1.2 in it, and so it detects it should use log4j 1.2 (