Ted Chester Jenks created SPARK-44142: -----------------------------------------
Summary: Utility to convert python types to spark types compares Python "type" object rather than user's "tpe" for categorical data types Key: SPARK-44142 URL: https://issues.apache.org/jira/browse/SPARK-44142 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Ted Chester Jenks In the typehints utility that converts python types to spark types, the line: {code:java} # categorical types elif isinstance(tpe, CategoricalDtype) or (isinstance(tpe, str) and type == "category"): return types.LongType() {code} uses Python's 'type' keyword in the comparison. Hence, it will always be false. Here, the user's type is actually stored in the variable 'tpe'. See line [here|https://github.com/apache/spark/blob/1b4048bf62dddae7d324c4b12aa409a1bd456dc5/python/pyspark/pandas/typedef/typehints.py#LL217C7-L217C7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org