Dongjoon Hyun created SPARK-53213:
-------------------------------------
Summary: Use Java `Base64` instead of `Base64.(en|decodeBase64)*`
Key: SPARK-53213
URL: https://issues.apache.org/jira/browse/SPARK-53213
Project: Spark
Issue Type: Sub-task
Components: Kubernetes, Spark Core, SQL
Affects Versions: 4.1.0
Reporter: Dongjoon Hyun
Java native API is over **9x faster** `Commons Codec`.
{code}
scala> val a = new Array[Byte](1_000_000_000)
scala>
spark.time(org.apache.commons.codec.binary.Base64.decodeBase64(org.apache.commons.codec.binary.Base64.encodeBase64String(a)).length)
Time taken: 10121 ms
val res0: Int = 1000000000
scala>
spark.time(java.util.Base64.getDecoder().decode(java.util.Base64.getEncoder().encodeToString(a)).length)
Time taken: 1156 ms
val res1: Int = 1000000000
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]