Spark Thrift Server - Not Scaling Down Executors 3.4.2+

2024-09-05 Thread Jayabindu Singh
Dear Spark Users,

We have run into an issue where with spark 3.3.2 using auto scaling with
STS is working fine, but with 3.4.2 or 3.5.2 executors are being left
behind and not scaling down.
Driver makes a call to remove the executor but some (not all)
executors never get removed.

Has anyone else noticed this or aware of any reported issues?

Any help will be greatly appreciated.

Regards
Jay


Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-09-30 Thread Jayabindu Singh
Hi Jon,

Using IAM as suggested by Jorn is the best approach.
We recently moved our spark workload from HDP to Spark on K8 and utilizing
IAM.
It will save you from secret management headaches and also allows a lot
more flexibility on access control and option to allow access to multiple
S3 buckets in the same pod.
We have implemented this across Azure, Google and AWS. Azure does require
some extra work to make it work.

On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:

> Don’t use static iam (s3) credentials. It is an outdated insecure method -
> even AWS recommend against using this for anything (cf eg
> https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html
> ).
> It is almost a guarantee to get your data stolen and your account
> manipulated.
>
> If you need to use kubernetes (which has its own very problematic security
> issues) then assign AWS IAM roles with minimal permissions to the pods (for
> EKS it means using OIDC, cf
> https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).
>
> Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren <
> jon.r.arangu...@gmail.com>:
>
> 
> Dear Spark Community Members,
>
> I trust this message finds you all in good health and spirits.
>
> I'm reaching out to the collective expertise of this esteemed community
> with a query regarding Spark on Kubernetes. As a newcomer, I have always
> admired the depth and breadth of knowledge shared within this forum, and it
> is my hope that some of you might have insights on a specific challenge I'm
> facing.
>
> I am currently trying to configure multiple Kubernetes secrets, notably
> multiple S3 keys, at the SparkConf level for a Spark application. My
> objective is to understand the best approach or methods to ensure that
> these secrets can be smoothly accessed by the Spark application.
>
> If any of you have previously encountered this scenario or possess
> relevant insights on the matter, your guidance would be highly beneficial.
>
> Thank you for your time and consideration. I'm eager to learn from the
> experiences and knowledge present within this community.
>
> Warm regards,
> Jon
>
>


Spark Log Shipper to Cloud Bucket

2023-04-17 Thread Jayabindu Singh
Greetings Everyone!

We are in need to ship spark (driver and executor) logs (not spark event
logs) from K8 to cloud bucket ADLS/S3.
Using fluentbit we are able to ship the log files but only to one single
path  container/logs/.
This will cause a huge number of files in a single folder and will create
performance issues on list and search operation on file.
 What we would like to do is to dynamically create the output folder which
can be as a spark app name.

If anyone has done this please share the details.

Regards
Jay


Spark Thrift Server - Autoscaling on K8

2023-03-08 Thread Jayabindu Singh
Hi All,

We are in the process of moving our workloads to K8 and looking for some
guidance to run Spark Thrift Server on K8.
We need the executor pods to autoscale based on the workload vs running it
with a static number of executors.

If any one has done it and can share the details, it will be really
appreciated.

Regards
Jayabindu Singh


Re: ADLS Gen2 adfs sample yaml configuration

2023-02-15 Thread Jayabindu Singh
here you go. Please update the values for your specific bucket to be used.

spark-defaults.com - to make sure event logs go to ADLS

spark.eventLog.enabled   true

spark.eventLog.dir - abfss://
containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs

 spark.history.fs.logDirectory abfss://
containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs



core-site.xml



 fs.abfss.impl

 org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem






  fs.adl.oauth2.access.token.provider.type

  ClientCredential






  fs.adl.oauth2.refresh.url

  https://login.microsoftonline.com//oauth2/v2.0/authorize







  fs.adl.oauth2.client.id

  client_id






  fs.adl.oauth2.credential

  client secret







  fs.azure.account.key.incortapocstorage.dfs.core.windows.net

  access key






You need to have the ADLS jars in the spark class path. I just copy these
from hadoop/share/tools/lib to spark/jars.


Regards

Jay






On Tue, Feb 14, 2023 at 9:50 PM Kondala Ponnaboina (US)
 wrote:

> Hello,
>
> I need help/sample adfs(*Active Directory Federation Services) (ADLS
> GEN2) *on how to configure ADLS GEN2(adfs) configurations in yaml file
> with spark history server ??
>
> I would like to see running jobs from JupiterLab notebook with
> SparkOnK8sV3.0.2 Kernel shell.
>
>
>
> Any help is much appreciated ..
>
>
>
> Thanks,
>
> Kondal
>
>
>
>
>
>
>
> --
>
> The information transmitted, including any attachments, is intended only
> for the person or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited, and all liability arising therefrom is disclaimed. If you
> received this in error, please contact the sender and delete the material
> from any computer.
>
> In the event the content of this email includes Tax advice, the content of
> this email is limited to the matters specifically addressed herein and is
> not intended to address other potential tax consequences or the potential
> application of tax penalties to this or any other matter.
>
> PricewaterhouseCoopers LLP is a Delaware limited liability partnership.
> This communication may come from PricewaterhouseCoopers LLP or one of its
> subsidiaries.
>
> --
>