It's not possible to load everything into memory. We should use a big query connector (should be existing already?) and register table B and C and temp views in Spark.
On Fri, May 14, 2021 at 8:50 AM bo zhao <zhaobo20082...@gmail.com> wrote: > Hi Team, > > I've followed Spark community for several years. This is my first time for > asking help. I hope you guys can give some experience. > > I want to develop a spark application with processing a sqlscript file. > The data is on BigQuery. > For example, the sqlscript is: > > delete from tableA; > insert into tableA select b.columnB1, c.columnC2 from tableB b, tableC c; > > > I can parse this file. In my opinion, After parsing the file, steps should > follow these below: > > #step1: read tableB, tableC into memory(Spark) > #step2. register views for tableB's dataframe and tableC's dataframe > #step3. use spark.sql("select b.columnB1, c.columnC2 from tableB b, tableC > c") to get a new dataframe > #step4. new dataframe.write().() to tableA using mode of "OVERWRITE" > > My question: > #1 If there are 10 tables or more tables, do I need to read each table > into memory though Spark bases on memory compution? > #2 Is there a much easier way to deal with my scenarios, for example, I > just define the datasource(BigQuery) and just parse sqlscript file, others > are run by Spark. > > Please share your experience or idea. >