High number of tasks when ran on a hybrid cluster
Hi , I recently created a spark cluster on AWS-EMR using a fleet configuration with hybrid instance types. The instance types on this cluster vary depending on the availability of the type. While running the same spark applications that were running on homogenous cluster(some pyspark apps doing dataframe operations), I've observed a big spike in the number of tasks(like 1000x times) . I would really appreciate it if someone could point me in the right direction.I'm having hard time in understanding how having a hybrid type cluster can have impact on the number of tasks The cluster was on: Spark 2.4.6 with spark.sql.adaptive.enabled = true spark.dynamicAllocation.enabled=true Kind Regards
Re: Spark program not receiving messages from Cloud Pubsub
I was able to get it working. It needed a SparkSession to be instantiated and wait for termination signal from the user. In my case I used a StreamingContext - https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/streaming/StreamingContext.html Pramod Biligiri On Sun, Aug 7, 2022 at 9:59 AM Pramod Biligiri wrote: > Hi, > > I have a simple Java program that reads messages from a Google Cloud > Pubsub topic and prints them. It works correctly when I run the program as > a standalone, but it fails to receive messages when run using spark-submit. > It connects to the subscription using my authentication credentials, but > doesn't receive any messages post that. > >1. Do programs launched using spark-submit have to follow a different >structure in general? My program doesn't do any Spark related stuff as of >now, but I'll be adding it later. >2. Are there working examples of Spark + Cloud Pubsub integration? I >came across a library called Apache Bahir, but is it a must to use a >library like that? > > The code for my example can be found here: > https://github.com/pramodbiligiri/pubsub-spark > > Pramod Biligiri >
Re: [Spark SQL] Omit Create Table Statement in Spark Sql
you have to saveAsTable or view to make a SQL query. As the title, does Spark Sql have a feature like Flink Catalog to omit `Create Table` statement, and write sql query directly ? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Spark SQL] Omit Create Table Statement in Spark Sql
As the title, does Spark Sql have a feature like Flink Catalog to omit `Create Table` statement, and write sql query directly ? | | comeonyfzhu | | comeonyf...@163.com |