It is definitely not the case for Spark SQL. A temporary table (much like
dataFrame) is a just a logical plan with a name and it is not iterated
unless a query is fired on it.
I am not sure if using rdd.take in py code to verify the schema is a right
approach as it creates a spark job.
BTW, why
I have an RDD queried from a scan of a data source. Sometimes the RDD has
rows and at other times it has none. I would like to register this RDD as
a temporary table in a SQL context. I suspect this will work in Scala, but
in PySpark some code assumes that the RDD has rows in it, which are used