Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20024#discussion_r159589126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -203,9 +203,26 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String // UDFToString private[this] def castToString(from: DataType): Any => Any = from match { case BinaryType => buildCast[Array[Byte]](_, UTF8String.fromBytes) + case StringType => buildCast[UTF8String](_, identity) case DateType => buildCast[Int](_, d => UTF8String.fromString(DateTimeUtils.dateToString(d))) --- End diff -- we may covert a string to `UTF8String` and then convert it back, which is inefficient. I think we should create a special `StringBuilder` for `UTF8String`, e.g. ``` class UTF8StringBuilder { public void append(UTF8String str) } ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org