[ 
https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-26706:
----------------------------
    Affects Version/s:     (was: 3.0.0)

> Fix Cast$mayTruncate for bytes
> ------------------------------
>
>                 Key: SPARK-26706
>                 URL: https://issues.apache.org/jira/browse/SPARK-26706
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.2, 2.4.0
>            Reporter: Anton Okolnychyi
>            Assignee: Anton Okolnychyi
>            Priority: Blocker
>              Labels: correctness
>             Fix For: 2.3.3, 2.4.1
>
>
> The logic in {{Cast$mayTruncate}} is broken for bytes.
> Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while 
> {{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently, 
> {{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will 
> behave differently.
> Potentially, this bug can lead to silently corrupting someone's data.
> {code}
> // executes silently even though Long is converted into Byte
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
>   .map(b => b - 1)
>   .show()
> +-----+
> |value|
> +-----+
> |  -12|
> |  -11|
> |  -10|
> |   -9|
> |   -8|
> |   -7|
> |   -6|
> |   -5|
> |   -4|
> |   -3|
> +-----+
> // throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
> as it may truncate
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
>   .map(s => s - 1)
>   .show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to