Spark Thrift Server - Not Scaling Down Executors 3.4.2+
Dear Spark Users, We have run into an issue where with spark 3.3.2 using auto scaling with STS is working fine, but with 3.4.2 or 3.5.2 executors are being left behind and not scaling down. Driver makes a call to remove the executor but some (not all) executors never get removed. Has anyone else noticed this or aware of any reported issues? Any help will be greatly appreciated. Regards Jay
Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration
Hi Jon, Using IAM as suggested by Jorn is the best approach. We recently moved our spark workload from HDP to Spark on K8 and utilizing IAM. It will save you from secret management headaches and also allows a lot more flexibility on access control and option to allow access to multiple S3 buckets in the same pod. We have implemented this across Azure, Google and AWS. Azure does require some extra work to make it work. On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke wrote: > Don’t use static iam (s3) credentials. It is an outdated insecure method - > even AWS recommend against using this for anything (cf eg > https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html > ). > It is almost a guarantee to get your data stolen and your account > manipulated. > > If you need to use kubernetes (which has its own very problematic security > issues) then assign AWS IAM roles with minimal permissions to the pods (for > EKS it means using OIDC, cf > https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html). > > Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren < > jon.r.arangu...@gmail.com>: > > > Dear Spark Community Members, > > I trust this message finds you all in good health and spirits. > > I'm reaching out to the collective expertise of this esteemed community > with a query regarding Spark on Kubernetes. As a newcomer, I have always > admired the depth and breadth of knowledge shared within this forum, and it > is my hope that some of you might have insights on a specific challenge I'm > facing. > > I am currently trying to configure multiple Kubernetes secrets, notably > multiple S3 keys, at the SparkConf level for a Spark application. My > objective is to understand the best approach or methods to ensure that > these secrets can be smoothly accessed by the Spark application. > > If any of you have previously encountered this scenario or possess > relevant insights on the matter, your guidance would be highly beneficial. > > Thank you for your time and consideration. I'm eager to learn from the > experiences and knowledge present within this community. > > Warm regards, > Jon > >
Spark Log Shipper to Cloud Bucket
Greetings Everyone! We are in need to ship spark (driver and executor) logs (not spark event logs) from K8 to cloud bucket ADLS/S3. Using fluentbit we are able to ship the log files but only to one single path container/logs/. This will cause a huge number of files in a single folder and will create performance issues on list and search operation on file. What we would like to do is to dynamically create the output folder which can be as a spark app name. If anyone has done this please share the details. Regards Jay
Spark Thrift Server - Autoscaling on K8
Hi All, We are in the process of moving our workloads to K8 and looking for some guidance to run Spark Thrift Server on K8. We need the executor pods to autoscale based on the workload vs running it with a static number of executors. If any one has done it and can share the details, it will be really appreciated. Regards Jayabindu Singh
Re: ADLS Gen2 adfs sample yaml configuration
here you go. Please update the values for your specific bucket to be used. spark-defaults.com - to make sure event logs go to ADLS spark.eventLog.enabled true spark.eventLog.dir - abfss:// containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs spark.history.fs.logDirectory abfss:// containen...@storageaccount.dfs.core.windows.net/tenant/spark/eventlogs core-site.xml fs.abfss.impl org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem fs.adl.oauth2.access.token.provider.type ClientCredential fs.adl.oauth2.refresh.url https://login.microsoftonline.com//oauth2/v2.0/authorize fs.adl.oauth2.client.id client_id fs.adl.oauth2.credential client secret fs.azure.account.key.incortapocstorage.dfs.core.windows.net access key You need to have the ADLS jars in the spark class path. I just copy these from hadoop/share/tools/lib to spark/jars. Regards Jay On Tue, Feb 14, 2023 at 9:50 PM Kondala Ponnaboina (US) wrote: > Hello, > > I need help/sample adfs(*Active Directory Federation Services) (ADLS > GEN2) *on how to configure ADLS GEN2(adfs) configurations in yaml file > with spark history server ?? > > I would like to see running jobs from JupiterLab notebook with > SparkOnK8sV3.0.2 Kernel shell. > > > > Any help is much appreciated .. > > > > Thanks, > > Kondal > > > > > > > > -- > > The information transmitted, including any attachments, is intended only > for the person or entity to which it is addressed and may contain > confidential and/or privileged material. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended recipient > is prohibited, and all liability arising therefrom is disclaimed. If you > received this in error, please contact the sender and delete the material > from any computer. > > In the event the content of this email includes Tax advice, the content of > this email is limited to the matters specifically addressed herein and is > not intended to address other potential tax consequences or the potential > application of tax penalties to this or any other matter. > > PricewaterhouseCoopers LLP is a Delaware limited liability partnership. > This communication may come from PricewaterhouseCoopers LLP or one of its > subsidiaries. > > -- >