Not really, unless you’re doing something wrong (e.g. Call collect or similar).

In the foreach loop you’re typically registering a temp table, by converting an 
RDD to data frame. All the subsequent queries are executed in parallel on the 
workers.

I haven’t built production apps with this pattern but I have successfully built 
a prototype where I execute dynamic SQL on top of a 15 minute window (obtained 
with .window on the Dstream) - and it works as expected.

Check this out for code example: 
https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala

-adrian

From: Daniel Haviv
Date: Monday, October 12, 2015 at 12:52 PM
To: user
Subject: SQLContext within foreachRDD

Hi,
As things that run inside foreachRDD run at the driver, does that mean that if 
we use SQLContext inside foreachRDD the data is sent back to the driver and 
only then the query is executed or is it executed at the executors?


Thank you.
Daniel


Reply via email to