[spark] branch branch-3.3 updated: [SPARK-43751][SQL][DOC] Document `unbase64` behavior change

yao Thu, 25 May 2023 20:34:47 -0700

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new 11d2f7316d6 [SPARK-43751][SQL][DOC] Document `unbase64` behavior change
11d2f7316d6 is described below

commit 11d2f7316d6ddd4a00853deb74b4a65f6f4c899c
Author: Cheng Pan <cheng...@apache.org>
AuthorDate: Fri May 26 11:33:38 2023 +0800

    [SPARK-43751][SQL][DOC] Document `unbase64` behavior change
    
    ### What changes were proposed in this pull request?
    
    After SPARK-37820, `select unbase64("abcs==")`(malformed input) always 
throws an exception, this PR does not help in that case, it only improves the 
error message for `to_binary()`.
    
    So, `unbase64()`'s behavior for malformed input changed silently after 
SPARK-37820:
    - before: return a best-effort result, because it uses 
[LENIENT](https://github.com/apache/commons-codec/blob/rel/commons-codec-1.15/src/main/java/org/apache/commons/codec/binary/Base64InputStream.java#L46)
 policy: any trailing bits are composed into 8-bit bytes where possible. The 
remainder are discarded.
    - after: throw an exception
    
    And there is no way to restore the previous behavior. To tolerate the 
malformed input, the user should migrate `unbase64(<input>)` to 
`try_to_binary(<input>, 'base64')` to get NULL instead of interrupting by 
exception.
    
    ### Why are the changes needed?
    
    Add the behavior change to migration guide.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes.
    
    ### How was this patch tested?
    
    Manuelly review.
    
    Closes #41280 from pan3793/SPARK-43751.
    
    Authored-by: Cheng Pan <cheng...@apache.org>
    Signed-off-by: Kent Yao <y...@apache.org>
    (cherry picked from commit af6c1ec7c795584c28e15e4963eed83917e2f06a)
    Signed-off-by: Kent Yao <y...@apache.org>
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 5c46343d994..02648a8d7e6 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -68,6 +68,8 @@ license: |
   
   - Since Spark 3.3, the precision of the return type of round-like functions 
has been fixed. This may cause Spark throw `AnalysisException` of the 
`CANNOT_UP_CAST_DATATYPE` error class when using views created by prior 
versions. In such cases, you need to recreate the views using ALTER VIEW AS or 
CREATE OR REPLACE VIEW AS with newer Spark versions.
 
+  - Since Spark 3.3, the `unbase64` function throws error for a malformed 
`str` input. Use `try_to_binary(<str>, 'base64')` to tolerate malformed input 
and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function 
returns a best-efforts result for a malformed `str` input.
+
   - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS 
(b)`-style SQL statements, `grouping__id` returns different values from Apache 
Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by 
expressions plus grouping set columns. To restore the behavior before 3.3.1 and 
3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For 
details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) 
and [SPARK-40562](https:/ [...]
 
 ## Upgrading from Spark SQL 3.1 to 3.2


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-43751][SQL][DOC] Document `unbase64` behavior change

Reply via email to