Spark job terminated without any errors

2018-05-18 Thread karthikjay
We have created multiples spark jobs (as far JAR) and run it using spark-submit in a nohup mode. Most of the jobs quits after a while. We tried to harness the logs for failures but the only message that gave us some clue was "18/05/07 18:31:38 INFO Worker: Executor app-20180507180436-0016/0

Re: OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

2018-05-18 Thread Bryan Cutler
The example works for me, please check your environment and ensure you are using Spark 2.3.0 where OneHotEncoderEstimator was introduced. On Fri, May 18, 2018 at 12:57 AM, Matteo Cossu wrote: > Hi, > > are you sure Dataset has a method withColumns? > > On 15 May 2018 at

RDD does not have sc error

2018-05-18 Thread Chao Fang
Hello, Today I used SparkSession.read.format(“HBASETABLE”).options.(“zk”,” zkaddress”).load() API to create a dataset from HBase data source and of course I write code to extends BaseRelation and PrunedFilteredScan to provide Logical plan for this HBase data source. I use InputFormat to

Spark on YARN in client-mode: do we need 1 vCore for the AM?

2018-05-18 Thread peay
Hello, I run a Spark cluster on YARN, and we have a bunch of client-mode applications we use for interactive work. Whenever we start one of this, an application master container is started. My understanding is that this is mostly an empty shell, used to request further containers or get

RE: How to Spark can solve this example

2018-05-18 Thread JUNG YOUSUN
How about Structured Streaming with Kafka? It is possible to operate through window time. For more information, see here https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html Sincerely, Yousun Jeong From: Matteo

RE: How to Spark can solve this example

2018-05-18 Thread Esa Heikkinen
Hello That is good to hear, but are there exist some good practical (Python or Scala) examples ? This would help a lot. I tried to do that by Apache Flink (and its CEP) and it was not so piece cake. Best, Esa From: Matteo Cossu Sent: Friday, May 18, 2018 10:51 AM To: Esa

Re: OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql.Dataset.withColumns

2018-05-18 Thread Matteo Cossu
Hi, are you sure Dataset has a method withColumns? On 15 May 2018 at 16:58, Mina Aslani wrote: > Hi, > > I get below error when I try to run oneHotEncoderEstimator example. > https://github.com/apache/spark/blob/b74366481cc87490adf4e69d26389e >

Understanding the results from Spark's KMeans clustering object

2018-05-18 Thread shubham
Hello Everyone, I am performing clustering on a dataset using PySpark. To find the number of clusters I performed clustering over a range of values (2,20) and found the wsse (within-cluster sum of squares) values for each value of k. This where I found something unusual. According to my

Re: How to Spark can solve this example

2018-05-18 Thread Matteo Cossu
Hello Esa, all the steps that you described can be performed with Spark. I don't know about CEP, but Spark Streaming should be enough. Best, Matteo On 18 May 2018 at 09:20, Esa Heikkinen wrote: > Hi > > > > I have attached fictive example (pdf-file) about

Exception thrown in awaitResult during application launch in yarn cluster

2018-05-18 Thread Shiyuan
Hi Spark-users, I am using pyspark on a yarn cluster. One of my spark application launch failed. Only the driver container had started before it failed on the ACCEPTED state. The error message is very short and I cannot make sense of it. The error message is attached below. Any possible causes