High number of tasks when ran on a hybrid cluster

2022-08-09 Thread murat migdisoglu
Hi ,

I recently created a spark cluster on AWS-EMR using a fleet configuration
with hybrid instance types. The instance types on this cluster vary
depending on the availability of the type.

 While running the same spark applications that were running on homogenous
cluster(some pyspark apps doing dataframe operations), I've observed a big
spike in the number of tasks(like 1000x times) .

I would really appreciate it if someone could point me in the right
direction.I'm having hard time in understanding how having a hybrid type
cluster can have impact on the number of tasks
The cluster was on:
Spark 2.4.6 with
spark.sql.adaptive.enabled = true
spark.dynamicAllocation.enabled=true

Kind Regards


Re: Spark program not receiving messages from Cloud Pubsub

2022-08-09 Thread Pramod Biligiri
 I was able to get it working. It needed a SparkSession to be instantiated
and wait for termination signal from the user. In my case I used a
StreamingContext -
https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/streaming/StreamingContext.html

Pramod Biligiri

On Sun, Aug 7, 2022 at 9:59 AM Pramod Biligiri 
wrote:

> Hi,
>
> I have a simple Java program that reads messages from a Google Cloud
> Pubsub topic and prints them. It works correctly when I run the program as
> a standalone, but it fails to receive messages when run using spark-submit.
> It connects to the subscription using my authentication credentials, but
> doesn't receive any messages post that.
>
>1. Do programs launched using spark-submit have to follow a different
>structure in general? My program doesn't do any Spark related stuff as of
>now, but I'll be adding it later.
>2. Are there working examples of Spark + Cloud Pubsub integration? I
>came across a library called Apache Bahir, but is it a must to use a
>library like that?
>
> The code for my example can be found here:
> https://github.com/pramodbiligiri/pubsub-spark
>
> Pramod Biligiri
>


Re: [Spark SQL] Omit Create Table Statement in Spark Sql

2022-08-09 Thread pengyh

you have to saveAsTable or view to make a SQL query.


As the title, does Spark Sql have a feature like Flink Catalog to omit 
`Create Table` statement, and write sql query directly ?


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark SQL] Omit Create Table Statement in Spark Sql

2022-08-09 Thread 阿强
As the title, does Spark Sql have a feature like Flink Catalog to omit `Create 
Table` statement, and write sql query directly ?
| |
comeonyfzhu
|
|
comeonyf...@163.com
|