Reading RDD by (key, data) from s3

2019-04-16 Thread Gorka Bravo Martinez
Hi, I am trying to read gzipped json data from s3, my idea would be to do => data = (s3_keys .mapValues(lambda x: x, s3_read_data(x) ) for that I though about using sc.textFile instead of s3_read_data, but wouldn't work. Any idea how to achieve a solution in here?

How to use same SparkSession in another app?

2019-04-16 Thread Rishikesh Gawade
Hi. I wish to use a SparkSession created by one app in another app so that i can use the dataframes belonging to that session. Is it possible to use the same sparkSession in another app? Thanks, Rishikesh

Dynamic executor scaling spark/Kubernetes

2019-04-16 Thread purna pradeep
Hello, Is Kubernetes Dynamic executor scaling for spark is available in latest release of spark I mean scaling the executors based on the work load vs preallocating number of executors for a spark job Thanks, Purna

Re: How to use same SparkSession in another app?

2019-04-16 Thread Jacek Laskowski
Hi, Not possible. What are you really trying to do? Why do you need to share dataframes? They're nothing but metadata of a distributed computation (no data inside) so what would be the purpose of such sharing? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Balakumar iyer S
Hi , While running the following spark code in the cluster with following configuration it is spread into 3 job Id's CLUSTER CONFIGURATION 3 NODE CLUSTER NODE 1 - 64GB 16CORES NODE 2 - 64GB 16CORES NODE 3 - 64GB 16CORES At Job Id 2 job is stuck at the stage 51 of 254 and then it starts

Re: [External Sender] How to use same SparkSession in another app?

2019-04-16 Thread Femi Anthony
Why not save the data frame to persistent storage s3/HDFS in the first application and read it back in the 2nd ? On Tue, Apr 16, 2019 at 8:58 PM Rishikesh Gawade wrote: > Hi. > I wish to use a SparkSession created by one app in another app so that i > can use the dataframes belonging to that

Re: An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Ankit Khettry
Hi Balakumar Two things. One - It seems like your cluster is running out of memory and then eventually out of disc , likely while materializing the dataframe to write (what's the volume of data created by the join?) Two - Your job is running in local mode, and is able to utilize just the master

Re: Reading RDD by (key, data) from s3

2019-04-16 Thread yujhe.li
You can't, sparkcontext is a singleton object. You have to use hadoop library or aws client to read files on s3. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

K8s-Spark client mode : Executor image not able to download application jar from driver

2019-04-16 Thread Nikhil Chinnapa
Environment: Spark: 2.4.0 Kubernetes:1.14 Query: Does application jar needs to be part of both Driver and Executor image? Invocation point (from Java code): sparkLaunch = new SparkLauncher() .setMaster(LINUX_MASTER)