Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
ocess). > The amount of memory required also depend on how many fields are used in > the > results. > > On Tue, Aug 9, 2016 at 11:09 AM, Zoltan Fedor <zoltan.1.fe...@gmail.com> > wrote: > >> Does this mean you only have 1.6G memory for executor (others left for > &

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
tributes of the UDF? On Mon, Aug 8, 2016 at 5:59 PM, Davies Liu <dav...@databricks.com> wrote: > On Mon, Aug 8, 2016 at 2:24 PM, Zoltan Fedor <zoltan.1.fe...@gmail.com> > wrote: > > Hi all, > > > > I have an interesting issue trying to use UDFs from SparkSQL in

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
Hi all, I have an interesting issue trying to use UDFs from SparkSQL in Spark 2.0.0 using pyspark. There is a big table (5.6 Billion rows, 450Gb in memory) loaded into 300 executors's memory in SparkSQL, on which we would do some calculation using UDFs in pyspark. If I run my SQL on only a

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
ling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o20)) >>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > *Hi Zoltan* > > Add hive-site.xml to your YARN_CONF_DIR. i.e. $SPARK_HOME/conf/yarn-conf > > Deenar

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
tserver > -DskipTests clean package > > > On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com> > wrote: > >> Hi Deenar, >> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >> ($SPARK_HOME/hadoop-conf) to YARN_CONF_

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
Yes, I have the hive-site.xml in $SPARK_HOME/conf, also in yarn-conf, in /etc/hive/conf, etc On Thu, Oct 29, 2015 at 10:46 AM, Kai Wei <kai.wei...@gmail.com> wrote: > Did you try copy it to spark/conf dir? > On 30 Oct 2015 1:42 am, "Zoltan Fedor" <zoltan

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
riftserver -DskipTests clean package On Thu, Oct 29, 2015 at 11:05 AM, Kai Wei <kai.wei...@gmail.com> wrote: > Failed to see a hadoop-2.5 profile in pom. Maybe that's the problem. > On 30 Oct 2015 1:51 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> wrote: > >> The fu

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
w a lot about how pyspark works. Can you possibly try running > spark-shell and do the same? > > sqlContext.sql("show databases").collect > > Deenar > > On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com> > wrote: > >> Yes, I am.

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
The funny thing is, that with Spark 1.2.0 on the same machine (Spark 1.2.0 is the default shipped with CDH 5.3.3) the same hive-site.xml is being picked up and I have no problem whatsoever. On Thu, Oct 29, 2015 at 10:48 AM, Zoltan Fedor <zoltan.0.fe...@gmail.com> wrote: > Yes, I have

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
There is /user/biapp in hdfs. The problem is that the hive-site.xml is being ignored, so it is looking for it locally. On Thu, Oct 29, 2015 at 10:40 AM, Kai Wei <kai.wei...@gmail.com> wrote: > Create /user/biapp in hdfs manually first. > On 30 Oct 2015 1:36 am, "Zoltan Fe

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
rk/conf/yarn-conf to > $SPARK_HOME/conf/yarn-conf > > and it worked. You may be better off with a custom build for CDH 5.3.3 > hadoop, which you already have done. > > Deenar > > On 29 October 2015 at 14:35, Zoltan Fedor <zoltan.0.fe...@gmail.com> > wrote: > &

Re: No way to supply hive-site.xml in yarn client mode?

2015-10-29 Thread Zoltan Fedor
xt.\n', JavaObject id=o24)) >>> On Thu, Oct 29, 2015 at 11:44 AM, Deenar Toraskar <deenar.toras...@gmail.com > wrote: > > Zoltan > > you should have these in your existing CDH 5.3, that's the best place to > get them. Find where spark is running from and should should

No way to supply hive-site.xml in yarn client mode?

2015-10-28 Thread Zoltan Fedor
Hi, We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 on it in yarn client mode with Hive. I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I am not able to make SparkSQL to pick up the hive-site.xml when runnig pyspark. hive-site.xml is located in