[ https://issues.apache.org/jira/browse/SPARK-44142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-44142: ------------------------------------ Assignee: Ted Chester Jenks > Utility to convert python types to spark types compares Python "type" object > rather than user's "tpe" for categorical data types > -------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-44142 > URL: https://issues.apache.org/jira/browse/SPARK-44142 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.4.0 > Reporter: Ted Chester Jenks > Assignee: Ted Chester Jenks > Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > In the typehints utility that converts python types to spark types, the line: > {code:java} > # categorical types > elif isinstance(tpe, CategoricalDtype) or (isinstance(tpe, str) and type > == "category"): > return types.LongType() {code} > uses Python's 'type' keyword in the comparison. Hence, it will always be > false. Here, the user's type is actually stored in the variable 'tpe'. > > > See line > [here|https://github.com/apache/spark/blob/1b4048bf62dddae7d324c4b12aa409a1bd456dc5/python/pyspark/pandas/typedef/typehints.py#LL217C7-L217C7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org