Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20163#discussion_r161365496 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala --- @@ -144,6 +145,7 @@ object EvaluatePython { } case StringType => (obj: Any) => nullSafeConvert(obj) { + case _: Calendar => null case _ => UTF8String.fromString(obj.toString) --- End diff -- @cloud-fan, how about something like this then? ```scala case StringType => (obj: Any) => nullSafeConvert(obj) { // Shortcut for string conversion case c: String => UTF8String.fromString(c) // Here, we return null for 'array', 'tuple', 'dict', 'list', 'datetime.datetime', // 'datetime.date' and 'datetime.time' because those string conversions are // not quite consistent with SQL string representation of data. case _: java.util.Calendar | _: net.razorvine.pickle.objects.Time | _: java.util.List[_] | _: java.util.Map[_, _] => null case c if c.getClass.isArray => null // Here, we keep the string conversion fall back for compatibility. // TODO: We should revisit this and rewrite the type conversion logic in Spark 3.x. case other => UTF8String.fromString(other.toString) } ``` My few tests: `datetime.time`: ``` from pyspark.sql.functions import udf from datetime import time f = udf(lambda x: time(0, 0), "string") spark.range(1).select(f("id")).show() ``` ``` +--------------------+ | <lambda>(id)| +--------------------+ |Time: 0 hours, 0 ...| +--------------------+ ``` `array`: ``` from pyspark.sql.functions import udf import array f = udf(lambda x: array.array("c", "aaa"), "string") spark.range(1).select(f("id")).show() ``` ``` +------------+ |<lambda>(id)| +------------+ | [C@11618d9e| +------------+ ``` `tuple`: ``` from pyspark.sql.functions import udf f = udf(lambda x: (x,), "string") spark.range(1).select(f("id")).show() ``` ``` +--------------------+ | <lambda>(id)| +--------------------+ |[Ljava.lang.Objec...| +--------------------+ ``` `list`: ``` from pyspark.sql.functions import udf from datetime import datetime f = udf(lambda x: [datetime(1990, 1, 1)], "string") spark.range(1).select(f("id")).show() ``` ``` +--------------------+ | <lambda>(id)| +--------------------+ |[java.util.Gregor...| +--------------------+ ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org