[ https://issues.apache.org/jira/browse/SPARK-22566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takuya Ueshin reassigned SPARK-22566: ------------------------------------- Assignee: Guilherme Berger > Better error message for `_merge_type` in Pandas to Spark DF conversion > ----------------------------------------------------------------------- > > Key: SPARK-22566 > URL: https://issues.apache.org/jira/browse/SPARK-22566 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.2.0 > Reporter: Guilherme Berger > Assignee: Guilherme Berger > Priority: Minor > > When creating a Spark DF from a Pandas DF without specifying a schema, schema > inference is used. This inference can fail when a column contains values of > two different types; this is ok. The problem is the error message does not > tell us in which column this happened. > When this happens, it is painful to debug since the error message is too > vague. > I plan on submitting a PR which fixes this, providing a better error message > for such cases, containing the column name (and possibly the problematic > values too). > >>> spark_session.createDataFrame(pandas_df) > File "redacted/pyspark/sql/session.py", line 541, in createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "redacted/pyspark/sql/session.py", line 401, in _createFromLocal > struct = self._inferSchemaFromList(data) > File "redacted/pyspark/sql/session.py", line 333, in _inferSchemaFromList > schema = reduce(_merge_type, map(_infer_schema, data)) > File "redacted/pyspark/sql/types.py", line 1124, in _merge_type > for f in a.fields] > File "redacted/pyspark/sql/types.py", line 1118, in _merge_type > raise TypeError("Can not merge type %s and %s" % (type(a), type(b))) > TypeError: Can not merge type <class 'pyspark.sql.types.LongType'> and <class > 'pyspark.sql.types.StringType'> > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org