I have an RDD queried from a scan of a data source. Sometimes the RDD has rows and at other times it has none. I would like to register this RDD as a temporary table in a SQL context. I suspect this will work in Scala, but in PySpark some code assumes that the RDD has rows in it, which are used to verify the schema:
https://github.com/apache/spark/blob/branch-1.3/python/pyspark/sql/context.py#L299 Before I attempt to extend the Scala code to handle an empty RDD or provide an empty DataFrame that can be registered, I was wondering what people recommend in this case. Perhaps there's a simple way of registering an empty RDD as a temporary table in a PySpark SQL context that I'm overlooking. An alternative is to add special case logic in the client code to deal with an RDD backed by an empty table scan. But since the SQL will already handle that, I was hoping to avoid special case logic. Eric