Re: registering an empty RDD as a temp table in a PySpark SQL context

2015-08-18 Thread Hemant Bhanawat
It is definitely not the case for Spark SQL. A temporary table (much like dataFrame) is a just a logical plan with a name and it is not iterated unless a query is fired on it. I am not sure if using rdd.take in py code to verify the schema is a right approach as it creates a spark job. BTW, why

registering an empty RDD as a temp table in a PySpark SQL context

2015-08-17 Thread Eric Walker
I have an RDD queried from a scan of a data source. Sometimes the RDD has rows and at other times it has none. I would like to register this RDD as a temporary table in a SQL context. I suspect this will work in Scala, but in PySpark some code assumes that the RDD has rows in it, which are used