Hi Kant,
Based on my understanding, I think the only difference is the overhead of
the selection/creation of SqlContext for the query you have passed. As the
table / view is already available for use, sparkSession.sql('your query')
should be simple & good enough.
Following uses the session/context by default created and available:
* sparkSession.sql(**"select value from table")*
while the following would look for create one & run the query (which I
believe is extra overhead):
*df.sqlContext().sql(**"select value from table")*
Regards
Raj
On Wed, Dec 6, 2017 at 6:07 PM, kant kodali wrote:
> Hi All,
>
> I have the following snippets of the code and I wonder what is the
> difference between these two and which one should I use? I am using spark
> 2.2.
>
> Dataset df = sparkSession.readStream()
> .format("kafka")
> .load();
>
> df.createOrReplaceTempView("table");
> df.printSchema();
>
> *Dataset resultSet = df.sqlContext().sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery
> streamingQuery = resultSet
> .writeStream()
> .trigger(Trigger.ProcessingTime(1000))
> .format("console")
> .start();
>
>
> vs
>
>
> Dataset df = sparkSession.readStream()
> .format("kafka")
> .load();
>
> df.createOrReplaceTempView("table");
>
> *Dataset resultSet = sparkSession.sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery
> streamingQuery = resultSet
> .writeStream()
> .trigger(Trigger.ProcessingTime(1000))
> .format("console")
> .start();
>
>
> Thanks!
>
>