Hi,
I am trying to read gzipped json data from s3, my idea would be to do =>
data = (s3_keys
.mapValues(lambda x: x, s3_read_data(x)
)
for that I though about using sc.textFile instead of s3_read_data, but wouldn't
work. Any idea how to achieve a solution in here?
Hi.
I wish to use a SparkSession created by one app in another app so that i
can use the dataframes belonging to that session. Is it possible to use the
same sparkSession in another app?
Thanks,
Rishikesh
Hello,
Is Kubernetes Dynamic executor scaling for spark is available in latest
release of spark
I mean scaling the executors based on the work load vs preallocating number
of executors for a spark job
Thanks,
Purna
Hi,
Not possible. What are you really trying to do? Why do you need to share
dataframes? They're nothing but metadata of a distributed computation (no
data inside) so what would be the purpose of such sharing?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL
Hi ,
While running the following spark code in the cluster with following
configuration it is spread into 3 job Id's
CLUSTER CONFIGURATION
3 NODE CLUSTER
NODE 1 - 64GB 16CORES
NODE 2 - 64GB 16CORES
NODE 3 - 64GB 16CORES
At Job Id 2 job is stuck at the stage 51 of 254 and then it starts
Why not save the data frame to persistent storage s3/HDFS in the first
application and read it back in the 2nd ?
On Tue, Apr 16, 2019 at 8:58 PM Rishikesh Gawade
wrote:
> Hi.
> I wish to use a SparkSession created by one app in another app so that i
> can use the dataframes belonging to that
Hi Balakumar
Two things.
One - It seems like your cluster is running out of memory and then
eventually out of disc , likely while materializing the dataframe to write
(what's the volume of data created by the join?)
Two - Your job is running in local mode, and is able to utilize just the
master
You can't, sparkcontext is a singleton object. You have to use hadoop library
or aws client to read files on s3.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Environment:
Spark: 2.4.0
Kubernetes:1.14
Query: Does application jar needs to be part of both Driver and Executor
image?
Invocation point (from Java code):
sparkLaunch = new SparkLauncher()
.setMaster(LINUX_MASTER)