[ANNOUNCE] Apache Toree 0.2.0-incubating Released

2018-08-15 Thread Luciano Resende
Apache Toree is a kernel for the Jupyter Notebook platform providing interactive and remote access to Apache Spark. The Apache Toree community is pleased to announce the release of Apache Toree 0.2.0-incubating which provides various bug fixes and the following enhancements. * Support Apache

JdbcRDD - schema always resolved as nullable=true

2018-08-15 Thread Subhash Sriram
Hi Spark Users, We do a lot of processing in Spark using data that is in MS SQL server. Today, I created a DataFrame against a table in SQL Server using the following: val dfSql=spark.read.jdbc(connectionString, table, props) I noticed that every column in the DataFrame showed as

Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
If you are able to log onto the node where UI has been launched, then try `ps -aux | grep HistoryServer` and the first column of output should be the user. On Wed, Aug 15, 2018 at 10:26 PM Fawze Abujaber wrote: > Thanks Manu, Do you know how i can see which user the UI is running, > because i'm

java.lang.UnsupportedOperationException: No Encoder found for Set[String]

2018-08-15 Thread V0lleyBallJunki3
Hello, I am using Spark 2.2.2 with Scala 2.11.8. I wrote a short program val spark = SparkSession.builder().master("local[4]").getOrCreate() case class TestCC(i: Int, ss: Set[String]) import spark.implicits._ import spark.sqlContext.implicits._ val testCCDS = Seq(TestCC(1,Set("SS","Salil")),

from_json schema order

2018-08-15 Thread Brandon Geise
Hi, Can someone confirm whether ordering matters between the schema and underlying JSON string? Thanks, Brandon

Dynamic Allocation not removing executors

2018-08-15 Thread Maximiliano Patricio Méndez
Hi, I found an issue trying to use dynamic allocation in 2.3.1 where the driver does not remove idle executors under some circunstances. For the first instance of this happening, it seems that a change introduced in 2.2.1/2.3.0 (SPARK-21656 )

Re: from_json function

2018-08-15 Thread Maxim Gekk
Hello Denis, The from_json function supports only the fail fast mode, see: https://github.com/apache/spark/blob/e2ab7deae76d3b6f41b9ad4d0ece14ea28db40ce/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L568 Your settings "mode" -> "PERMISSIVE" will be

[K8S] Spark initContainer custom bootstrap support for Spark master

2018-08-15 Thread Li Gao
Hi, We've noticed on the latest Master (not Spark 2.3.1 branch), the support for Kubernetes initContainer is no longer there. What would be the path forward if we need to do custom bootstrap actions (i.e. run additional scripts) prior to driver/executor container entering running mode? Thanks,

Shuffle uses Direct Memory Buffer even after setting "spark.shuffle.io.preferDirectBufs = false"

2018-08-15 Thread Vaibhav Kulkarni
Hi, I am using Standalone Spark 2.3 and have a question regarding Shuffle. Going by the documentation, default Shuffle behaviour is to use Direct Memory buffers. But, even after I set the following parameter, I notice Shuffle still uses Direct Memory buffers. spark.shuffle.io.preferDirectBufs

from_json function

2018-08-15 Thread dbolshak
Hello community, I can not manage to run from_json method with "columnNameOfCorruptRecord" option. ``` import org.apache.spark.sql.functions._ val data = Seq( "{'number': 1}", "{'number': }" ) val schema = new StructType() .add($"number".int)

Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Fawze Abujaber
Thanks Manu, Do you know how i can see which user the UI is running, because i'm using cloudera manager and i created a user for cloudera manager and called it spark but this didn't solve me issue and here i'm trying to find out the user for the spark hisotry UI. On Wed, Aug 15, 2018 at 5:11 PM

Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
Hi Fawze, A) The file permission is currently hard coded to 770 ( https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L287 ). B) I think add all users (including UI) to the group like Spark will do. On Wed, Aug 15, 2018 at

Java API for statistics of spark job running on yarn

2018-08-15 Thread Serkan TAS
Hi all, I am facing and issue for long running spark job on yarn. If there occures some bottle neck on hdfs and/or kafka, active batch count increases immidiately. I am plannning to check the active batch count with java client and create alarms for the operations group. So, is it possible to

spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-15 Thread purna pradeep
im running Spark 2.3 job on kubernetes cluster kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z", GoVersion:"go1.9.4", Compiler:"gc",

Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Fawze Abujaber
Hi Manu, Thanks for your response. Yes, i see but still interesting to know how i can see these applications from the spark history UI. How i can know with which user i'm logged in when i'm navigating the spark history UI. The Spark process is running with cloudera-scm and the events written

Re: Unable to see completed application in Spark 2 history web UI

2018-08-15 Thread Manu Zhang
Hi Fawze, In Spark 2.3, HistoryServer will check for file permissions when reading event logs written by your applications. (Please check https://issues.apache.org/jira/browse/SPARK-20172). With file permissions of 770, HistoryServer is not permitted to read the event log. That's why you were