[ANNOUNCE] Announcing Apache Spark 2.3.2

2018-09-26 Thread Saisai Shao
We are happy to announce the availability of Spark 2.3.2! Apache Spark 2.3.2 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.2, head over to the download page:

Re: Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread Yinan Li
The spark-init ConfigMap is used for the init-container that is responsible for downloading remote dependencies. The k8s submission client run by spark-submit should create the ConfigMap and add a ConfigMap volume in the driver pod. Can you provide the command you used to run the job? On Wed, Sep

Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread purna pradeep
Hello , We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And

Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread Purna Pradeep Mamillapalli
We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And from *kubectl

Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread Christopher Carney
Our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And from *kubectl describe pod*: *Warning FailedMount 9m (x128

Re: Given events with start and end times, how to count the number of simultaneous events using Spark?

2018-09-26 Thread kathleen li
You can use Spark sql window function , something like df.createOrReplaceTempView(“dfv”) Select count(eventid) over ( partition by start_time, end_time orderly start_time) from dfv Sent from my iPhone > On Sep 26, 2018, at 11:32 AM, Debajyoti Roy wrote: > > The problem statement and an

Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

2018-09-26 Thread Thakrar, Jayesh
Cannot reproduce your situation. Can you share Spark version? Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92) Type

Fwd: Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread Christopher Carney
Our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And from *kubectl describe pod*: *Warning FailedMount 9m (x128

RE: Python kubernetes spark 2.4 branch

2018-09-26 Thread Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Hi Ilan/Yinan, My observation is as follows: The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”. I guess we

Given events with start and end times, how to count the number of simultaneous events using Spark?

2018-09-26 Thread Debajyoti Roy
The problem statement and an approach to solve it using windows is described here: https://stackoverflow.com/questions/52509498/given-events-with-start-and-end-times-how-to-count-the-number-of-simultaneous-e Looking for more elegant/performant solutions, if they exist. TIA !

Re: Creating spark Row from database values

2018-09-26 Thread Kuttaiah Robin
Thanks, I'll check it out. On Wed, Sep 26, 2018 at 6:25 PM Shahab Yunus wrote: > Hi there. Have you seen this link? > https://medium.com/@mrpowers/manually-creating-spark-dataframes-b14dae906393 > > > It shows you multiple ways to manually create a dataframe. > > Hope it helps. > > Regards, >

Re: spark.lapply

2018-09-26 Thread Felix Cheung
It looks like the native R process is terminated from buffer overflow. Do you know how much data is involved? From: Junior Alvarez Sent: Wednesday, September 26, 2018 7:33 AM To: user@spark.apache.org Subject: spark.lapply Hi! I’m using spark.lapply() in

Re: Creating spark Row from database values

2018-09-26 Thread Shahab Yunus
Hi there. Have you seen this link? https://medium.com/@mrpowers/manually-creating-spark-dataframes-b14dae906393 It shows you multiple ways to manually create a dataframe. Hope it helps. Regards, Shahab On Wed, Sep 26, 2018 at 8:02 AM Kuttaiah Robin wrote: > Hello, > > Currently I have

spark and STS tokens (Federation Tokens)

2018-09-26 Thread Ashic Mahtab
Hi, I'm looking to have spark jobs access S3 with temporary credentials. I've seen some examples around AssumeRole, but I have a scenario where the temp credentials are provided by GetFederationToken. Is there anything that can help, or do I need to use boto to execute GetFederationToken, and

Creating spark Row from database values

2018-09-26 Thread Kuttaiah Robin
Hello, Currently I have Oracle database table with description as shown below; Table INSIGHT_ID_FED_IDENTIFIERS - CURRENT_INSTANCE_ID VARCHAR2(100) PREVIOUS_INSTANCE_ID VARCHAR2(100) Sample values in the table basically output of select * from

spark.lapply

2018-09-26 Thread Junior Alvarez
Hi! I'm using spark.lapply() in sparkR on a mesos service I get the following crash randomly (The spark.lapply() function is called around 150 times, some times it crashes after 16 calls, other after 25 calls and so on...it is completely random, even though the data used in the actual call is

Unsubscribe

2018-09-26 Thread Iryna Kharaborkina

Re: Lightweight pipeline execution for single eow

2018-09-26 Thread Jatin Puri
Using FAIR mode. If no other way. I think there is a limitation on number of parallel jobs that spark can run. Is there a way that more number of jobs can run in parallel. This is alright because, this sparkcontext would only be used during web service calls. I looked at spark configuration page

Re: How to access line fileName in loading file using the textFile method

2018-09-26 Thread vermanurag
Spark has sc.wholeTextFiles() which returns RDD of tuple. First element of tuple if the file name and second element is the file content. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Pivot Column ordering in spark

2018-09-26 Thread Manohar Rao
I am doing a pivot transformation on an input dataset Following input schema = |-- c_salutation: string (nullable = true) |-- c_preferred_cust_flag: string (nullable = true) |-- integer_type_col: integer (nullable = false) |-- long_type_col: long