Re: An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Ankit Khettry
Hi Balakumar Two things. One - It seems like your cluster is running out of memory and then eventually out of disc , likely while materializing the dataframe to write (what's the volume of data created by the join?) Two - Your job is running in local mode, and is able to utilize just the master

An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Balakumar iyer S
Hi , While running the following spark code in the cluster with following configuration it is spread into 3 job Id's CLUSTER CONFIGURATION 3 NODE CLUSTER NODE 1 - 64GB 16CORES NODE 2 - 64GB 16CORES NODE 3 - 64GB 16CORES At Job Id 2 job is stuck at the stage 51 of 254 and then it starts

Re: [External Sender] How to use same SparkSession in another app?

2019-04-16 Thread Femi Anthony
Why not save the data frame to persistent storage s3/HDFS in the first application and read it back in the 2nd ? On Tue, Apr 16, 2019 at 8:58 PM Rishikesh Gawade wrote: > Hi. > I wish to use a SparkSession created by one app in another app so that i > can use the dataframes belonging to that

Re: How to use same SparkSession in another app?

2019-04-16 Thread Jacek Laskowski
Hi, Not possible. What are you really trying to do? Why do you need to share dataframes? They're nothing but metadata of a distributed computation (no data inside) so what would be the purpose of such sharing? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

Re: Reading RDD by (key, data) from s3

2019-04-16 Thread yujhe.li
You can't, sparkcontext is a singleton object. You have to use hadoop library or aws client to read files on s3. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Dynamic executor scaling spark/Kubernetes

2019-04-16 Thread purna pradeep
Hello, Is Kubernetes Dynamic executor scaling for spark is available in latest release of spark I mean scaling the executors based on the work load vs preallocating number of executors for a spark job Thanks, Purna

How to use same SparkSession in another app?

2019-04-16 Thread Rishikesh Gawade
Hi. I wish to use a SparkSession created by one app in another app so that i can use the dataframes belonging to that session. Is it possible to use the same sparkSession in another app? Thanks, Rishikesh

Reading RDD by (key, data) from s3

2019-04-16 Thread Gorka Bravo Martinez
Hi, I am trying to read gzipped json data from s3, my idea would be to do => data = (s3_keys .mapValues(lambda x: x, s3_read_data(x) ) for that I though about using sc.textFile instead of s3_read_data, but wouldn't work. Any idea how to achieve a solution in here?

K8s-Spark client mode : Executor image not able to download application jar from driver

2019-04-16 Thread Nikhil Chinnapa
Environment: Spark: 2.4.0 Kubernetes:1.14 Query: Does application jar needs to be part of both Driver and Executor image? Invocation point (from Java code): sparkLaunch = new SparkLauncher() .setMaster(LINUX_MASTER)