Re: Spark Connect, Master, and Workers

2023-09-01 Thread James Yu
Can I simply understand Spark Connect this way: The client process is now the Spark driver? From: Brian Huynh Sent: Thursday, August 10, 2023 10:15 PM To: Kezhi Xiong Cc: user@spark.apache.org Subject: Re: Spark Connect, Master, and Workers Hi Kezhi, Yes,

[k8s] Fail to expose custom port on executor container specified in my executor pod template

2023-06-26 Thread James Yu
Hi Team, I have no luck in trying to expose port 5005 (for remote debugging purpose) on my executor container using the following pod template and spark configuration s3a://mybucket/pod-template-executor-debug.yaml

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread James Yu
Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x and gets submitted to the Spark cluster. Which version of log4j gets actually used during the Spark session? From: Sean Owen Sent: Monday, December 13, 2021 8:25 AM To: Jörn Franke Cc:

Re: start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-08 Thread James Yu
damage or destruction. On Wed, 8 Dec 2021 at 19:45, James Yu mailto:ja...@ispot.tv>> wrote: Just thought about another possibility which is to containerize the history server and run the container with proper restart policy. This may be the approach we will be taking because the deploymen

Re: start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-08 Thread James Yu
Sent: Tuesday, December 7, 2021 1:29 PM To: James Yu Cc: user @spark Subject: Re: start-history-server.sh doesn't survive system reboot. Recommendation? The scripts just launch the processes. To make any process restart on system restart, you would need to set it up as a system service (i.e

start-history-server.sh doesn't survive system reboot. Recommendation?

2021-12-07 Thread James Yu
Hi Users, We found that the history server launched by using the "start-history-server.sh" command does not survive system reboot. Any recommendation of making it always up even after reboot? Thanks, James

Re: Performance Problems Migrating to S3A Committers

2021-08-05 Thread James Yu
See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may help your team. From: Johnny Burns Sent: Tuesday, June 22, 2021 3:41 PM To: user@spark.apache.org Cc: data-orchestration-team Subject: Performance Problems Migrating to S3A Committers

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
rito Sent: Wednesday, February 3, 2021 11:05 AM To: James Yu ; user Subject: Re: Poor performance caused by coalesce to 1 Coalesce is reducing the parallelization of your last stage, in your case to 1 task. So, it’s natural it will give poor performance especially with large data. If you absol

Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
Hi Team, We are running into this poor performance issue and seeking your suggestion on how to improve it: We have a particular dataset which we aggregate from other datasets and like to write out to one single file (because it is small enough). We found that after a series of

Re: Where do the executors get my app jar from?

2020-08-14 Thread James Yu
Henoc, Ok. That is for Yarn with HDFS. What will happen in Kubernetes as resource manager without HDFS scenario? James From: Henoc Sent: Thursday, August 13, 2020 10:45 PM To: James Yu Cc: user ; russell.spit...@gmail.com Subject: Re: Where do the executors

Where do the executors get my app jar from?

2020-08-13 Thread James Yu
Hi, When I spark submit a Spark app with my app jar located in S3, obviously the Driver will download the jar from the s3 location. What is not clear to me is: where do the Executors get the jar from? From the same s3 location, or somehow from the Driver, or they don't need the jar? Thanks

Re: Spark driver thread

2020-03-06 Thread James Yu
Pol, thanks for your reply. Actually I am running Spark apps in CLUSTER mode. Is what you said still applicable in cluster mode. Thanks in advance for your further clarification. From: Pol Santamaria Sent: Friday, March 6, 2020 12:59 AM To: James Yu Cc: user

Spark driver thread

2020-03-05 Thread James Yu
Hi, Does a Spark driver always works as single threaded? If yes, does it mean asking for more than one vCPU for the driver is wasteful? Thanks, James