Romi Kuntsman created SPARK-18621: ------------------------------------- Summary: PySQL SQL Types (aka Dataframa Schema) have __repr__() with Scala and not Python representation Key: SPARK-18621 URL: https://issues.apache.org/jira/browse/SPARK-18621 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.2, 1.6.2 Reporter: Romi Kuntsman Priority: Minor
When using Python's repr() on an object, the expected result is a string that Python can evaluate to construct the object. See: https://docs.python.org/2/library/functions.html#func-repr However, when getting a DataFrame schema in PySpark, the code (in "__repr()__" overload methods) returns the string representation for Scala, rather than for Python. Relevant code in PySpark: https://github.com/apache/spark/blob/5f02d2e5b4d37f554629cbd0e488e856fffd7b6b/python/pyspark/sql/types.py#L442 Python Code: # 1. define object struct1 = StructType([StructField("f1", StringType(), True)]) # 2. print representation, expected to be like above print(repr(struct1)) # 3. actual result: # StructType(List(StructField(f1,StringType,true))) # 4. try to use result in code struct2 = StructType(List(StructField(f1,StringType,true))) # 5. get bunch of errors # Unresolved reference 'List' # Unresolved reference 'f1' # StringType is class, not constructed object # Unresolved reference 'true' -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org