[ https://issues.apache.org/jira/browse/SPARK-12348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059119#comment-15059119 ]
Hurshal Patel commented on SPARK-12348: --------------------------------------- whoops, i think this was intentional but there is still value in returning a more complete error like "The RDD is empty, can not infer schema" instead of raising the generic error > PySpark _inferSchema crashes with incorrect exception on an empty RDD > --------------------------------------------------------------------- > > Key: SPARK-12348 > URL: https://issues.apache.org/jira/browse/SPARK-12348 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.0 > Reporter: Hurshal Patel > Priority: Minor > > {code} > >>> rdd = sc.emptyRDD() > >>> df = sqlContext.createDataFrame(rdd) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in > createDataFrame > rdd, schema = self._createFromRDD(data, schema, samplingRatio) > File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in > _createFromRDD > struct = self._inferSchema(rdd, samplingRatio) > File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in > _inferSchema > first = rdd.first() > File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first > raise ValueError("RDD is empty") > ValueError: RDD is empty > {code} > throws "RDD is empty" in rdd.first() instead of the intended message "The > first row in RDD is empty, can not infer schema" in sqlContext._inferSchema -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org