For the same data source in two SQLs, how to read it once？

Gang Li Wed, 09 Sep 2020 00:43:12 -0700

Hi all,

I ran two Spark SQL, they read the same table, partition, but write to
different tables. Is there any way to merge them into one SQL, and realize
the read data operation is only run once?


Suppose there are two SQL:
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test2 PARTITION(dt='20200909')
SELECT name, number, age
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test1 PARTITION(dt='20200909')
SELECT name, number, sex
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------

Running these two SQL statements will generate two Physical Plan, and the
data source "spark_input_test" will be read twice. If spark_input_test is
read only once, it will save memory.


Cheers,
Gang Li





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

For the same data source in two SQLs, how to read it once？

Reply via email to