Re: Exception when using cosh

2015-10-21 Thread Reynold Xin
I think we made a mistake and forgot to register the function in the
registry:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Do you mind submitting a pull request to fix this? Should be an one line
change. I filed a ticket to track this:
https://issues.apache.org/jira/browse/SPARK-11233




On Wed, Oct 21, 2015 at 2:30 AM, Shagun Sodhani 
wrote:

> Hi! I was trying out different arithmetic functions in SparkSql. I noticed
> a weird thing. While *sinh* and *tanh* functions are working, using *cosh* 
> results
> in an error saying:
>
> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
> undefined function cosh;*
>
> The documentation says *cosh* is implemented since 1.4 and I also find it
> weird that since *tanh* and *sinh* are implemented (and working) why
> would *cosh* fail. I looked for it on jira but could not find any related
> bug. Could someone confirm if this is an actual issue or something wrong on
> my part.
>
> Query I am using: SELECT cosh(`age`) as `data` FROM `table`
> Spark Version: 10.4
> SparkSql Version: 1.5.1
>
> I am using the standard example of (name, age) schema (though I am setting
> age as Double and not Int as I am trying out maths functions).
>
> The entire error stack can be found here .
>
> Thanks!
>
> Shagun
>


Re: Exception when using cosh

2015-10-21 Thread Shagun Sodhani
Sure! Would do that.

Thanks a lot

On Wed, Oct 21, 2015 at 10:59 PM, Reynold Xin  wrote:

> I think we made a mistake and forgot to register the function in the
> registry:
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>
> Do you mind submitting a pull request to fix this? Should be an one line
> change. I filed a ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-11233
>
>
>
>
> On Wed, Oct 21, 2015 at 2:30 AM, Shagun Sodhani 
> wrote:
>
>> Hi! I was trying out different arithmetic functions in SparkSql. I
>> noticed a weird thing. While *sinh* and *tanh* functions are working,
>> using *cosh* results in an error saying:
>>
>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>> undefined function cosh;*
>>
>> The documentation says *cosh* is implemented since 1.4 and I also find
>> it weird that since *tanh* and *sinh* are implemented (and working) why
>> would *cosh* fail. I looked for it on jira but could not find any
>> related bug. Could someone confirm if this is an actual issue or something
>> wrong on my part.
>>
>> Query I am using: SELECT cosh(`age`) as `data` FROM `table`
>> Spark Version: 10.4
>> SparkSql Version: 1.5.1
>>
>> I am using the standard example of (name, age) schema (though I am
>> setting age as Double and not Int as I am trying out maths functions).
>>
>> The entire error stack can be found here .
>>
>> Thanks!
>>
>> Shagun
>>
>
>


Re: Exception when using cosh

2015-10-21 Thread Shagun Sodhani
@Reynold submitted the PR: https://github.com/apache/spark/pull/9199

On Wed, Oct 21, 2015 at 11:01 PM, Shagun Sodhani 
wrote:

> Sure! Would do that.
>
> Thanks a lot
>
> On Wed, Oct 21, 2015 at 10:59 PM, Reynold Xin  wrote:
>
>> I think we made a mistake and forgot to register the function in the
>> registry:
>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>
>> Do you mind submitting a pull request to fix this? Should be an one line
>> change. I filed a ticket to track this:
>> https://issues.apache.org/jira/browse/SPARK-11233
>>
>>
>>
>>
>> On Wed, Oct 21, 2015 at 2:30 AM, Shagun Sodhani > > wrote:
>>
>>> Hi! I was trying out different arithmetic functions in SparkSql. I
>>> noticed a weird thing. While *sinh* and *tanh* functions are working,
>>> using *cosh* results in an error saying:
>>>
>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>> undefined function cosh;*
>>>
>>> The documentation says *cosh* is implemented since 1.4 and I also find
>>> it weird that since *tanh* and *sinh* are implemented (and working) why
>>> would *cosh* fail. I looked for it on jira but could not find any
>>> related bug. Could someone confirm if this is an actual issue or something
>>> wrong on my part.
>>>
>>> Query I am using: SELECT cosh(`age`) as `data` FROM `table`
>>> Spark Version: 10.4
>>> SparkSql Version: 1.5.1
>>>
>>> I am using the standard example of (name, age) schema (though I am
>>> setting age as Double and not Int as I am trying out maths functions).
>>>
>>> The entire error stack can be found here .
>>>
>>> Thanks!
>>>
>>> Shagun
>>>
>>
>>
>


Bringing up JDBC Tests to trunk

2015-10-21 Thread Luciano Resende
I have started looking into PR-8101 [1] and what is required to merge it
into trunk which will also unblock me around SPARK-10521 [2].

So here is the minimal plan I was thinking about :

- make the docker image version fixed so we make sure we are using the same
image all the time
- pull the required images on the Jenkins executors so tests are not
delayed/timedout because it is waiting for docker images to download
- create a profile to run the JDBC tests
- create daily jobs for running the JDBC tests


In parallel, I learned that Alan Chin from my team is working with the
AmpLab team to expand the build capacity for Spark, so I will use some of
the nodes he is preparing to test/run these builds for now.

Please let me know if there is anything else needed around this.


[1] https://github.com/apache/spark/pull/8101
[2] https://issues.apache.org/jira/browse/SPARK-10521

-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-21 Thread Jerry Lam
Hi guys,

There is another memory issue. Not sure if this is related to Tungsten this
time because I have it disable (spark.sql.tungsten.enabled=false). It
happens more there are too many tasks running (300). I need to limit the
number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used.

Best Regards,

Jerry

org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74)
at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339)


On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin  wrote:

> With Jerry's permission, sending this back to the dev list to close the
> loop.
>
>
> -- Forwarded message --
> From: Jerry Lam 
> Date: Tue, Oct 20, 2015 at 3:54 PM
> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ...
> To: Reynold Xin 
>
>
> Yup, coarse grained mode works just fine. :)
> The difference is that by default, coarse grained mode uses 1 core per
> task. If I constraint 20 cores in total, there can be only 20 tasks running
> at the same time. However, with fine grained, I cannot set the total number
> of cores and therefore, it could be +200 tasks running at the same time (It
> is dynamic). So it might be the calculation of how much memory to acquire
> fail when the number of cores cannot be known ahead of time because you
> cannot make the assumption that X tasks running in an executor? Just my
> guess...
>
>
> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin  wrote:
>
>> Can you try coarse-grained mode and see if it is the same?
>>
>>
>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam  wrote:
>>
>>> Hi Reynold,
>>>
>>> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but
>>> sometimes it does not. For one particular job, it failed all the time with
>>> the acquire-memory issue. I'm using spark on mesos with fine grained mode.
>>> Does it make a difference?
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin 
>>> wrote:
>>>
 Jerry - I think that's been fixed in 1.5.1. Do you still see it?

 On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam 
 wrote:

> I disabled it because of the "Could not acquire 65536 bytes of
> memory". It happens to fail the job. So for now, I'm not touching it.
>
> On Tue, Oct 20, 2015 at 4:48 PM, charmee  wrote:
>
>> We had disabled tungsten after we found few performance issues, but
>> had to
>> enable it back because we found that when we had large number of
>> group by
>> fields, if tungsten is disabled the shuffle keeps failing.
>>
>> Here is an excerpt from one of our engineers with his analysis.
>>
>> With Tungsten Enabled (default in spark 1.5):
>> ~90 files of 0.5G each:
>>
>> Ingest (after applying broadcast lookups) : 54 min
>> Aggregation (~30 fields in group by and another 40 in aggregation) :
>> 18 min
>>
>> With Tungsten Disabled:
>>
>> Ingest : 30 min
>> Aggregation : Erroring out
>>
>> On smaller tests we found that joins are slow with tungsten enabled.
>> With
>> GROUP BY, disabling tungsten is not working in the first place.
>>
>> Hope this helps.
>>
>> -Charmee
>>
>>
>>
>> --
>> View 

Re: Bringing up JDBC Tests to trunk

2015-10-21 Thread Josh Rosen
Hey Luciano,

This sounds like a reasonable plan to me. One of my colleagues has written
some Dockerized MySQL testing utilities, so I'll take a peek at those to
see if there are any specifics of their solution that we should adapt for
Spark.

On Wed, Oct 21, 2015 at 1:16 PM, Luciano Resende 
wrote:

> I have started looking into PR-8101 [1] and what is required to merge it
> into trunk which will also unblock me around SPARK-10521 [2].
>
> So here is the minimal plan I was thinking about :
>
> - make the docker image version fixed so we make sure we are using the
> same image all the time
> - pull the required images on the Jenkins executors so tests are not
> delayed/timedout because it is waiting for docker images to download
> - create a profile to run the JDBC tests
> - create daily jobs for running the JDBC tests
>
>
> In parallel, I learned that Alan Chin from my team is working with the
> AmpLab team to expand the build capacity for Spark, so I will use some of
> the nodes he is preparing to test/run these builds for now.
>
> Please let me know if there is anything else needed around this.
>
>
> [1] https://github.com/apache/spark/pull/8101
> [2] https://issues.apache.org/jira/browse/SPARK-10521
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-21 Thread Reynold Xin
Is this still Mesos fine grained mode?


On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam  wrote:

> Hi guys,
>
> There is another memory issue. Not sure if this is related to Tungsten
> this time because I have it disable (spark.sql.tungsten.enabled=false). It
> happens more there are too many tasks running (300). I need to limit the
> number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used.
>
> Best Regards,
>
> Jerry
>
> org.apache.spark.SparkException: Task failed while writing rows.
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
>   at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74)
>   at 
> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339)
>
>
> On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin  wrote:
>
>> With Jerry's permission, sending this back to the dev list to close the
>> loop.
>>
>>
>> -- Forwarded message --
>> From: Jerry Lam 
>> Date: Tue, Oct 20, 2015 at 3:54 PM
>> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ...
>> To: Reynold Xin 
>>
>>
>> Yup, coarse grained mode works just fine. :)
>> The difference is that by default, coarse grained mode uses 1 core per
>> task. If I constraint 20 cores in total, there can be only 20 tasks running
>> at the same time. However, with fine grained, I cannot set the total number
>> of cores and therefore, it could be +200 tasks running at the same time (It
>> is dynamic). So it might be the calculation of how much memory to acquire
>> fail when the number of cores cannot be known ahead of time because you
>> cannot make the assumption that X tasks running in an executor? Just my
>> guess...
>>
>>
>> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin  wrote:
>>
>>> Can you try coarse-grained mode and see if it is the same?
>>>
>>>
>>> On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam  wrote:
>>>
 Hi Reynold,

 Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but
 sometimes it does not. For one particular job, it failed all the time with
 the acquire-memory issue. I'm using spark on mesos with fine grained mode.
 Does it make a difference?

 Best Regards,

 Jerry

 On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin 
 wrote:

> Jerry - I think that's been fixed in 1.5.1. Do you still see it?
>
> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam 
> wrote:
>
>> I disabled it because of the "Could not acquire 65536 bytes of
>> memory". It happens to fail the job. So for now, I'm not touching it.
>>
>> On Tue, Oct 20, 2015 at 4:48 PM, charmee  wrote:
>>
>>> We had disabled tungsten after we found few performance issues, but
>>> had to
>>> enable it back because we found that when we had large number of
>>> group by
>>> fields, if tungsten is disabled the shuffle keeps failing.
>>>
>>> Here is an excerpt from one of our engineers with his analysis.
>>>
>>> With Tungsten Enabled (default in spark 1.5):
>>> ~90 files of 0.5G each:
>>>
>>> Ingest (after applying broadcast lookups) : 54 min
>>> Aggregation (~30 fields in group by and another 40 in aggregation) :
>>> 18 min
>>>
>>> With Tungsten Disabled:
>>>
>>> Ingest : 30 min
>>> Aggregation : Erroring out
>>>
>>> On smaller 

Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
All,

just to see if this happens to other as well.

  This is tested against the

   spark 1.5.1 ( branch 1.5  with label 1.5.2-SNAPSHOT with commit on Tue
Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266)

   Spark deployment mode : Spark-Cluster

   Notice that if we enable Kerberos mode, the spark yarn client fails with
the following:

*Could not initialize class org.apache.hadoop.hive.ql.metadata.Hive*
*java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.ql.metadata.Hive*
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.Client$.org$apache$spark$deploy$yarn$Client$$obtainTokenForHiveMetastore(Client.scala:1252)
at
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:271)
at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)


Diving in Yarn Client.scala code and tested against different dependencies
and notice the followings:  if  the kerberos mode is enabled,
Client.obtainTokenForHiveMetastore()
will try to use scala reflection to get Hive and HiveConf and method on
these method.


  val hiveClass =
mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
  val hive = hiveClass.getMethod("get").invoke(null)

  val hiveConf = hiveClass.getMethod("getConf").invoke(hive)
  val hiveConfClass =
mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")

  val hiveConfGet = (param: String) => Option(hiveConfClass
.getMethod("get", classOf[java.lang.String])
.invoke(hiveConf, param))


   If the "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" is used,
then you will get above exception. But if we use the

   "org.apache.hive" % "hive-exec" "0.13.1-cdh5.2.0"

 The above method will not throw exception.


  Here some questions and comments

0) is this a bug ?

1) Why spark-hive hive-exec behave differently ? I understand
spark-hive hive-exec has less dependencies

   but I would expect it functionally the same

2) Where I can find the source code for spark-hive hive-exec ?

3) regarding the method obtainTokenForHiveMetastore(),

   I would assume that the method will first check if the
hive-metastore uri is present before

   trying to get the hive metastore tokens, it seems to invoke the
reflection regardless the hive service in the cluster is enabled or
not.

4) Noticed the obtainTokenForHBase() in the same Class (Client.java) catches

   case e: java.lang.NoClassDefFoundError => logDebug("HBase Class not
found: " + e)

   and just ignore the exception ( log debug),

   but obtainTokenForHiveMetastore() does not catch
NoClassDefFoundError exception, I guess this is the problem.

private def *obtainTokenForHiveMetastore*(conf: Configuration,
credentials: Credentials) {

// rest of code

 } catch {
case e: java.lang.NoSuchMethodException => { logInfo("Hive Method
not found " + e); return }
case e: java.lang.ClassNotFoundException => { logInfo("Hive Class
not found " + e); return }
case e: Exception => { logError("Unexpected Exception " + e)
  throw new RuntimeException("Unexpected exception", e)
}
  }
}


thanks


Chester


Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-21 Thread Jerry Lam
Yes. The crazy thing about mesos running in fine grained mode is that there
is no way (correct me if I'm wrong) to set the number of cores per
executor. If one of my slaves on mesos has 32 cores, the fine grained mode
can allocate 32 cores on this executor for the job and if there are 32
tasks running on this executor at the same time, that is when the acquire
memory issue appears. Of course the 32 cores are dynamically allocated. So
mesos can take them back or put them in again depending on the cluster
utilization.

On Wed, Oct 21, 2015 at 5:13 PM, Reynold Xin  wrote:

> Is this still Mesos fine grained mode?
>
>
> On Wed, Oct 21, 2015 at 1:16 PM, Jerry Lam  wrote:
>
>> Hi guys,
>>
>> There is another memory issue. Not sure if this is related to Tungsten
>> this time because I have it disable (spark.sql.tungsten.enabled=false). It
>> happens more there are too many tasks running (300). I need to limit the
>> number of task to avoid this. The executor has 6G. Spark 1.5.1 is been used.
>>
>> Best Regards,
>>
>> Jerry
>>
>> org.apache.spark.SparkException: Task failed while writing rows.
>>  at 
>> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:393)
>>  at 
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>>  at 
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: Unable to acquire 67108864 bytes of memory
>>  at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
>>  at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:138)
>>  at 
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
>>  at 
>> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:74)
>>  at 
>> org.apache.spark.sql.execution.UnsafeKVExternalSorter.(UnsafeKVExternalSorter.java:56)
>>  at 
>> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:339)
>>
>>
>> On Tue, Oct 20, 2015 at 9:10 PM, Reynold Xin  wrote:
>>
>>> With Jerry's permission, sending this back to the dev list to close the
>>> loop.
>>>
>>>
>>> -- Forwarded message --
>>> From: Jerry Lam 
>>> Date: Tue, Oct 20, 2015 at 3:54 PM
>>> Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ...
>>> To: Reynold Xin 
>>>
>>>
>>> Yup, coarse grained mode works just fine. :)
>>> The difference is that by default, coarse grained mode uses 1 core per
>>> task. If I constraint 20 cores in total, there can be only 20 tasks running
>>> at the same time. However, with fine grained, I cannot set the total number
>>> of cores and therefore, it could be +200 tasks running at the same time (It
>>> is dynamic). So it might be the calculation of how much memory to acquire
>>> fail when the number of cores cannot be known ahead of time because you
>>> cannot make the assumption that X tasks running in an executor? Just my
>>> guess...
>>>
>>>
>>> On Tue, Oct 20, 2015 at 6:24 PM, Reynold Xin 
>>> wrote:
>>>
 Can you try coarse-grained mode and see if it is the same?


 On Tue, Oct 20, 2015 at 3:20 PM, Jerry Lam 
 wrote:

> Hi Reynold,
>
> Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers
> but sometimes it does not. For one particular job, it failed all the time
> with the acquire-memory issue. I'm using spark on mesos with fine grained
> mode. Does it make a difference?
>
> Best Regards,
>
> Jerry
>
> On Tue, Oct 20, 2015 at 5:27 PM, Reynold Xin 
> wrote:
>
>> Jerry - I think that's been fixed in 1.5.1. Do you still see it?
>>
>> On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam 
>> wrote:
>>
>>> I disabled it because of the "Could not acquire 65536 bytes of
>>> memory". It happens to fail the job. So for now, I'm not touching it.
>>>
>>> On Tue, Oct 20, 2015 at 4:48 PM, charmee  wrote:
>>>
 We had 

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-21 Thread Chester Chen
Doug
  thanks for responding.
 >>I think Spark just needs to be compiled against 1.2.1

   Can you elaborate on this, or specific command you are referring ?

   In our build.scala, I was including the following

"org.spark-project.hive" % "hive-exec" % "1.2.1.spark" intransitive()

   I am not sure how the Spark compilation is directly related to this,
please explain.

   When we submit the spark job, the we call Spark Yarn Client.scala
directly ( not using spark-submit).
   The client side is not depending on spark-assembly jar ( which is in the
hadoop cluster).  The job submission actually failed in the client side.

   Currently we get around this by replace the spark's hive-exec with
apache hive-exec.


Chester





On Wed, Oct 21, 2015 at 5:27 PM, Doug Balog  wrote:

> See comments below.
>
> > On Oct 21, 2015, at 5:33 PM, Chester Chen  wrote:
> >
> > All,
> >
> > just to see if this happens to other as well.
> >
> >   This is tested against the
> >
> >spark 1.5.1 ( branch 1.5  with label 1.5.2-SNAPSHOT with commit on
> Tue Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266)
> >
> >Spark deployment mode : Spark-Cluster
> >
> >Notice that if we enable Kerberos mode, the spark yarn client fails
> with the following:
> >
> > Could not initialize class org.apache.hadoop.hive.ql.metadata.Hive
> > java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hive.ql.metadata.Hive
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> org.apache.spark.deploy.yarn.Client$.org$apache$spark$deploy$yarn$Client$$obtainTokenForHiveMetastore(Client.scala:1252)
> > at
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:271)
> > at
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
> > at
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
> > at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)
> >
> >
> > Diving in Yarn Client.scala code and tested against different
> dependencies and notice the followings:  if  the kerberos mode is enabled,
> Client.obtainTokenForHiveMetastore() will try to use scala reflection to
> get Hive and HiveConf and method on these method.
> >
> >   val hiveClass =
> mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
> >   val hive = hiveClass.getMethod("get").invoke(null)
> >
> >   val hiveConf = hiveClass.getMethod("getConf").invoke(hive)
> >   val hiveConfClass =
> mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
> >
> >   val hiveConfGet = (param: String) => Option(hiveConfClass
> > .getMethod("get", classOf[java.lang.String])
> > .invoke(hiveConf, param))
> >
> >If the "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" is
> used, then you will get above exception. But if we use the
> >"org.apache.hive" % "hive-exec" "0.13.1-cdh5.2.0"
> >  The above method will not throw exception.
> >
> >   Here some questions and comments
> > 0) is this a bug ?
>
> I’m not an expert on this, but I think this might not be a bug.
> The Hive integration was redone for 1.5.0, see
> https://issues.apache.org/jira/browse/SPARK-6906
> and I think Spark just needs to be compiled against 1.2.1
>
>
> >
> > 1) Why spark-hive hive-exec behave differently ? I understand spark-hive
> hive-exec has less dependencies
> >but I would expect it functionally the same
>
> I don’t know.
>
> > 2) Where I can find the source code for spark-hive hive-exec ?
>
> I don’t know.
>
> >
> > 3) regarding the method obtainTokenForHiveMetastore(),
> >I would assume that the method will first check if the hive-metastore
> uri is present before
> >trying to get the hive metastore tokens, it seems to invoke the
> reflection regardless the hive service in the cluster is enabled or not.
>
> Checking to see if the hive-megastore.uri is present before trying to get
> a delegation token would be an improvement.
> Also checking to see if we are running in cluster mode would be good, too.
> I will file a JIRA and make these improvements.
>
> > 4) Noticed the obtainTokenForHBase() in the same Class (Client.java)
> catches
> >case e: java.lang.NoClassDefFoundError => logDebug("HBase Class not
> found: " + e)
> >and just ignore the exception ( log debug),
> >but obtainTokenForHiveMetastore() does not catch NoClassDefFoundError
> exception, I guess this is the problem.
> > private def obtainTokenForHiveMetastore(conf: Configuration,
> credentials: Credentials) {
> > // rest of code
> >  } catch {
> > case e: java.lang.NoSuchMethodException => { 

SPARK_DRIVER_MEMORY doc wrong

2015-10-21 Thread tyronecai
In conf/spark-env.sh.template
https://github.com/apache/spark/blob/master/conf/spark-env.sh.template#L42
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 1G)


SPARK_DRIVER_MEMORY is memory config for driver, not master.

Thanks!


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: SPARK_DRIVER_MEMORY doc wrong

2015-10-21 Thread Sean Owen
You're welcome to open a little pull request to fix that.

On Wed, Oct 21, 2015, 10:47 AM tyronecai  wrote:

> In conf/spark-env.sh.template
> https://github.com/apache/spark/blob/master/conf/spark-env.sh.template#L42
> # - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 1G)
>
>
> SPARK_DRIVER_MEMORY is memory config for driver, not master.
>
> Thanks!
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Exception when using cosh

2015-10-21 Thread Shagun Sodhani
Hi! I was trying out different arithmetic functions in SparkSql. I noticed
a weird thing. While *sinh* and *tanh* functions are working, using
*cosh* results
in an error saying:

*Exception in thread "main" org.apache.spark.sql.AnalysisException:
undefined function cosh;*

The documentation says *cosh* is implemented since 1.4 and I also find it
weird that since *tanh* and *sinh* are implemented (and working) why would
*cosh* fail. I looked for it on jira but could not find any related bug.
Could someone confirm if this is an actual issue or something wrong on my
part.

Query I am using: SELECT cosh(`age`) as `data` FROM `table`
Spark Version: 10.4
SparkSql Version: 1.5.1

I am using the standard example of (name, age) schema (though I am setting
age as Double and not Int as I am trying out maths functions).

The entire error stack can be found here .

Thanks!

Shagun


FW: Spark Streaming scheduler delay VS driver.cores

2015-10-21 Thread Adrian Tanase
Apologies for reposting this to the dev list but I’ve had no luck in getting 
information about spark.driver.cores on the user list.

Happy to create a PR with documentation improvements for the spark.driver.cores 
config setting after I get some more details.

Thanks!
-adrian

From: Adrian Tanase
Date: Monday, October 19, 2015 at 10:09 PM
To: "u...@spark.apache.org"
Subject: Re: Spark Streaming scheduler delay VS driver.cores

Bump on this question – does anyone know what is the effect of 
spark.driver.cores on the driver's ability to manage larger clusters?

Any tips on setting a correct value? I’m running Spark streaming on Yarn / 
Hadoop 2.6 / Spark 1.5.1.

Thanks,
-adrian

From: Adrian Tanase
Date: Saturday, October 17, 2015 at 10:58 PM
To: "u...@spark.apache.org"
Subject: Spark Streaming scheduler delay VS driver.cores

Hi,

I’ve recently bumped up the resources for a spark streaming job – and the 
performance started to degrade over time.
it was running fine on 7 nodes with 14 executor cores each (via Yarn) until I 
bumped executor.cores to 22 cores/node (out of 32 on AWS c3.xlarge, 24 for yarn)

The driver has 2 cores and 2 GB ram (usage is at zero).

For really low data volume it goes from 1-2 seconds per batch to 4-5 s/batch 
after about 6 hours, doing almost nothing. I’ve noticed that the scheduler 
delay is 3-4s, even 5-6 seconds for some tasks. Should be in the low tens of 
milliseconds. What’s weirder is that under moderate load (thousands of events 
per second) - the delay is not as obvious anymore.

After this I reduced the executor.cores to 20 and bumped driver.cores to 4 and 
it seems to be ok now.
However, this is totally empirical, I have not found any documentation, code 
samples or email discussion on how to properly set driver.cores.

Does anyone know:

  *   If I assign more cores to the driver/application manager, will it use 
them?
 *   I was looking at the process list with htop and only one of the jvm’s 
on the driver was really taking up CPU time
  *   What is a decent parallelism factor for a streaming app with 10-20 secs 
batch time? I found it odd that at  7 x 22 = 154 the driver is becoming a 
bottleneck
 *   I’ve seen people recommend 3-4 taks/core or ~1000 parallelism for 
clusters in the tens of nodes

Thanks in advance,
-adrian