Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-12 Thread Maytas Monsereenusorn
Hi, I was wondering if it's not possible to determine tasks to functions, is it still possible to easily figure out which job and stage completed which part of the query from the UI? For example, in the SQL tab of the Spark UI, I am able to see the query and the Job IDs for that query. However,

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Thanks! I will have a look. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Bjørn Jørgensen
Yes, it looks inside the docker containers folder. It will work if you are using s3 og gs. ons. 12. apr. 2023, 18:02 skrev Mich Talebzadeh : > Hi, > > In my spark-submit to eks cluster, I use the standard code to submit to > the cluster as below: > > spark-submit --verbose \ >--master

Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Hi, In my spark-submit to eks cluster, I use the standard code to submit to the cluster as below: spark-submit --verbose \ --master k8s://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name sparkOnEks \ --py-files local://$CODE_DIRECTORY/spark_on_eks.zip \

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Mich Talebzadeh
Hi Lingzhe Sun, Thanks for your comments. I am afraid I won't be able to take part in this project and contribute. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile

Re: How to determine the function of tasks on each stage in an Apache Spark application?

2023-04-12 Thread Jacek Laskowski
Hi, tl;dr it's not possible to "reverse-engineer" tasks to functions. In essence, Spark SQL is an abstraction layer over RDD API that's made up of partitions and tasks. Tasks are Scala functions (possibly with some Python for PySpark). A simple-looking high-level operator like DataFrame.join can

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread 孙令哲
Hi Rajesh, It's working fine, at least for now. But you'll need to build your own spark image using later versions. Lingzhe Sun Hirain Technologies Original: From:Rajesh Katkar Date:2023-04-12 21:36:52To:Lingzhe SunCc:Mich Talebzadeh , user Subject:Re: Re: spark streaming and

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Yi Huang
unsubscribe On Wed, Apr 12, 2023 at 3:59 PM Rajesh Katkar wrote: > Hi Lingzhe, > > We are also started using this operator. > Do you see any issues with it? > > > On Wed, 12 Apr, 2023, 7:25 am Lingzhe Sun, wrote: > >> Hi Mich, >> >> FYI we're using spark operator( >>

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Rajesh Katkar
Hi Lingzhe, We are also started using this operator. Do you see any issues with it? On Wed, 12 Apr, 2023, 7:25 am Lingzhe Sun, wrote: > Hi Mich, > > FYI we're using spark operator( > https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) to build > stateful structured streaming on k8s

Re: [SparkSQL, SparkUI, RESTAPI] How to extract the WholeStageCodeGen ids from SparkUI

2023-04-12 Thread Jacek Laskowski
Hi, You could use QueryExecutionListener or Spark listeners to intercept query execution events and extract whatever is required. That's what web UI does (as it's simply a bunch of SparkListeners --> https://youtu.be/mVP9sZ6K__Y ;-)). Pozdrawiam, Jacek Laskowski "The Internals Of" Online

PySpark tests are failed with the java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.sources.FakeSourceOne not found

2023-04-12 Thread Ranga Reddy
Hi Team, I am running the pyspark tests in Spark version and it failed with P*rovider org.apache.spark.sql.sources.FakeSourceOne not found.* Spark Version: 3.4.0/3.5.0 Python Version: 3.8.10 OS: Ubuntu 20.04 *Steps: * # /opt/data/spark/build/sbt -Phive clean package #

Re: Non string type partitions

2023-04-12 Thread Charles vinodh
There are other distributed execution engines (like hive, trino) that do support non-string data types for partition columns such as date and integer. Any idea why this restriction exists in Spark? .. On Tue, 11 Apr 2023 at 20:34, Chitral Verma wrote: > Because the name of the directory