YutaLin commented on PR #4315:
URL: 
https://github.com/apache/datafusion-comet/pull/4315#issuecomment-4445627791

   Hi @andygrove, thanks for the review!
   I've extract encode method and add null check.
   
   About "Spark accepts utf8 as an alias for UTF-8", spark only supports alias 
before 3.5, because it uses JDK `Charset.forName`. After 4.0, it has a 
whitelist check, so it doesn't support alias. I'd suggest we keep only utf-8 
now, WDYT?
   
   
https://spark.apache.org/docs/4.0.0/sql-migration-guide.html#upgrading-from-spark-sql-35-to-40
   > Since Spark 4.0, the encode() and decode() functions support only the 
following charsets ‘US-ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, 
‘UTF-16’, ‘UTF-32’. To restore the previous behavior when the function accepts 
charsets of the current JDK used by Spark, set spark.sql.legacy.javaCharsets to 
true.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to