YutaLin commented on PR #4315: URL: https://github.com/apache/datafusion-comet/pull/4315#issuecomment-4445627791
Hi @andygrove, thanks for the review! I've extract encode method and add null check. About "Spark accepts utf8 as an alias for UTF-8", spark only supports alias before 3.5, because it uses JDK `Charset.forName`. After 4.0, it has a whitelist check, so it doesn't support alias. I'd suggest we keep only utf-8 now, WDYT? https://spark.apache.org/docs/4.0.0/sql-migration-guide.html#upgrading-from-spark-sql-35-to-40 > Since Spark 4.0, the encode() and decode() functions support only the following charsets ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’, ‘UTF-32’. To restore the previous behavior when the function accepts charsets of the current JDK used by Spark, set spark.sql.legacy.javaCharsets to true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
