David Gingrich created SPARK-19507: -------------------------------------- Summary: pyspark.sql.types._verify_type() exceptions too broad to debug collections or nested data Key: SPARK-19507 URL: https://issues.apache.org/jira/browse/SPARK-19507 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 2.1.0 Environment: macOS Sierra 10.12.3 Spark 2.1.0, installed via Homebrew Reporter: David Gingrich Priority: Trivial
The private function pyspark.sql.types._verify_type() recursively checks an object against a datatype, raising an exception if the object does not satisfy the type. These messages are not specific enough to debug a data error in a collection or nested data, for instance: ``` >>> import pyspark.sql.types as typ >>> schema = typ.StructType([typ.StructField('nest1', >>> typ.MapType(typ.StringType(), typ.ArrayType(typ.FloatType())))]) >>> typ._verify_type({'nest1': {'nest2': [1]}}, schema) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/david/src/3p/spark/python/pyspark/sql/types.py", line 1355, in _verify_type _verify_type(obj.get(f.name), f.dataType, f.nullable, name=new_name) File "/Users/david/src/3p/spark/python/pyspark/sql/types.py", line 1349, in _verify_type _verify_type(v, dataType.valueType, dataType.valueContainsNull, name=new_name) File "/Users/david/src/3p/spark/python/pyspark/sql/types.py", line 1342, in _verify_type _verify_type(i, dataType.elementType, dataType.containsNull, name=new_name) File "/Users/david/src/3p/spark/python/pyspark/sql/types.py", line 1325, in _verify_type % (name, dataType, obj, type(obj))) TypeError: FloatType can not accept object 1 in type <class 'int'> ``` Passing and printing a field name would make debugging easier. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org