Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Mich Talebzadeh
When I run this job in local mode spark-submit --master local[4] with spark = SparkSession.builder \ .appName("tests") \ .enableHiveSupport() \ .getOrCreate() spark.conf.set("spark.sql.adaptive.enabled", "true") df3.explain(extended=True) and no caching I see this

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

2023-05-09 Thread Yong Zhang
Hi, Mich: Thanks for your reply, but maybe I didn't make my question clear. I am looking for a solution to compute the count of each element in an array, without "exploding" the array, and output a Map structure as a column. For example, for an array as ('a', 'b', 'a'), I want to output a

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Nitin Siwach
I do not think InMemoryFileIndex means it is caching the data. The caches get shown as InMemoryTableScan. InMemoryFileIndex is just for partition discovery and partition pruning. Any read will always show up as a scan from InMemoryFileIndex. It is not cached data. It is a cached file index. Please

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

2023-05-09 Thread Yong Zhang
Hi, Mich: Thanks for your reply, but maybe I didn't make my question clear. I am looking for a solution to compute the count of each element in an array, without "exploding" the array, and output a Map structure as a column. For example, for an array as ('a', 'b', 'a'), I want to output a

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Mich Talebzadeh
When you run this in yarn mode, it uses Broadcast Hash Join for join operation as shown in the following output. The datasets here are the same size, so it broadcasts one dataset to all of the executors and then reads the same dataset and does a hash join. It is typical of joins . No surprises

unsubscribe

2023-05-09 Thread Balakumar iyer S