Can I simply understand Spark Connect this way: The client process is now the
Spark driver?
From: Brian Huynh
Sent: Thursday, August 10, 2023 10:15 PM
To: Kezhi Xiong
Cc: user@spark.apache.org
Subject: Re: Spark Connect, Master, and Workers
Hi Kezhi,
Yes,
Hi Team,
I have no luck in trying to expose port 5005 (for remote debugging purpose) on
my executor container using the following pod template and spark configuration
s3a://mybucket/pod-template-executor-debug.yaml
Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x and
gets submitted to the Spark cluster. Which version of log4j gets actually used
during the Spark session?
From: Sean Owen
Sent: Monday, December 13, 2021 8:25 AM
To: Jörn Franke
Cc:
damage or destruction.
On Wed, 8 Dec 2021 at 19:45, James Yu mailto:ja...@ispot.tv>>
wrote:
Just thought about another possibility which is to containerize the history
server and run the container with proper restart policy. This may be the
approach we will be taking because the deploymen
Sent: Tuesday, December 7, 2021 1:29 PM
To: James Yu
Cc: user @spark
Subject: Re: start-history-server.sh doesn't survive system reboot.
Recommendation?
The scripts just launch the processes. To make any process restart on system
restart, you would need to set it up as a system service (i.e
Hi Users,
We found that the history server launched by using the
"start-history-server.sh" command does not survive system reboot. Any
recommendation of making it always up even after reboot?
Thanks,
James
See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may
help your team.
From: Johnny Burns
Sent: Tuesday, June 22, 2021 3:41 PM
To: user@spark.apache.org
Cc: data-orchestration-team
Subject: Performance Problems Migrating to S3A Committers
rito
Sent: Wednesday, February 3, 2021 11:05 AM
To: James Yu ; user
Subject: Re: Poor performance caused by coalesce to 1
Coalesce is reducing the parallelization of your last stage, in your case to 1
task. So, it’s natural it will give poor performance especially with large
data. If you absol
Hi Team,
We are running into this poor performance issue and seeking your suggestion on
how to improve it:
We have a particular dataset which we aggregate from other datasets and like to
write out to one single file (because it is small enough). We found that after
a series of
Henoc,
Ok. That is for Yarn with HDFS. What will happen in Kubernetes as resource
manager without HDFS scenario?
James
From: Henoc
Sent: Thursday, August 13, 2020 10:45 PM
To: James Yu
Cc: user ; russell.spit...@gmail.com
Subject: Re: Where do the executors
Hi,
When I spark submit a Spark app with my app jar located in S3, obviously the
Driver will download the jar from the s3 location. What is not clear to me is:
where do the Executors get the jar from? From the same s3 location, or somehow
from the Driver, or they don't need the jar?
Thanks
Pol, thanks for your reply.
Actually I am running Spark apps in CLUSTER mode. Is what you said still
applicable in cluster mode. Thanks in advance for your further clarification.
From: Pol Santamaria
Sent: Friday, March 6, 2020 12:59 AM
To: James Yu
Cc: user
Hi,
Does a Spark driver always works as single threaded?
If yes, does it mean asking for more than one vCPU for the driver is wasteful?
Thanks,
James
13 matches
Mail list logo