registering an empty RDD as a temp table in a PySpark SQL context

Eric Walker Mon, 17 Aug 2015 12:54:18 -0700

I have an RDD queried from a scan of a data source.  Sometimes the RDD has
rows and at other times it has none.  I would like to register this RDD as
a temporary table in a SQL context.  I suspect this will work in Scala, but
in PySpark some code assumes that the RDD has rows in it, which are used to
verify the schema:


https://github.com/apache/spark/blob/branch-1.3/python/pyspark/sql/context.py#L299

Before I attempt to extend the Scala code to handle an empty RDD or provide
an empty DataFrame that can be registered, I was wondering what people
recommend in this case.  Perhaps there's a simple way of registering an
empty RDD as a temporary table in a PySpark SQL context that I'm
overlooking.

An alternative is to add special case logic in the client code to deal with
an RDD backed by an empty table scan.  But since the SQL will already
handle that, I was hoping to avoid special case logic.

Eric

registering an empty RDD as a temp table in a PySpark SQL context

Reply via email to