Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1518435444 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3318,6 +3318,13 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1518435595 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3318,6 +3318,13 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1518435187 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3318,6 +3318,13 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
dongjoon-hyun commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1518433859 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3318,6 +3318,13 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
ted-jenks commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1517931178 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2426,21 +2426,34 @@ case class Chr(child: Expression)

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
cloud-fan commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1517826436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2426,21 +2426,34 @@ case class Chr(child: Expression)

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
cloud-fan commented on code in PR #45408: URL: https://github.com/apache/spark/pull/45408#discussion_r1517664214 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2426,21 +2426,27 @@ case class Chr(child: Expression)

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-08 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1985424230 I think making this configurable makes the most sense. For people processing data for external systems with Spark they can choose to chunk or not chunk data based on what the use-case

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984930529 As the Spark Community didn't get any issue report during v3.3.0 - v3.5.1 releases, I think this is a corner case. Maybe we can make the config internal. -- This is an automated

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984929745 +1 for the direction if we need to support both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984926315 Thank you @dongjoon-hyun. In such circumstances, I guess we can add a configuration for base64 classes to avoid breaking things again. AFAIK, Apache Hive also uses the JDK

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984433848 Thank you for the confirmation, @ted-jenks . Well, in this case, it's too late to change the behavior again. Apache Spark 3.3 is already the EOL status since last year and I don't

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1982836579 @dongjoon-hyun > It sounds like you have other systems to read Spark's data. Correct. The issue was that from 3.2 to 3.3 there was a behavior change in the base64 encodings used

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1981829489 Hi, @ted-jenks . Could you elaborate your correctness situation a little more? It sounds like you have other systems to read Spark's data. -- This is an automated message from

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1981542843 @dongjoon-hyun please may you take a look. Caused a big data correctness issue for us. -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
ted-jenks opened a new pull request, #45408: URL: https://github.com/apache/spark/pull/45408 ### What changes were proposed in this pull request? [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder ### Why are the changes needed? In