[ 
https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Okolnychyi updated SPARK-26706:
-------------------------------------
    Description: 
The logic in {{Cast$mayTruncate}} is broken for bytes.

Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while 
{{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently, 
{{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will behave 
differently.

Potentially, this bug can lead to silently corrupting someone's data.

{code}
// executes silently even though Long is converted into Byte
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
  .map(b => b - 1)
  .show()
+-----+
|value|
+-----+
|  -12|
|  -11|
|  -10|
|   -9|
|   -8|
|   -7|
|   -6|
|   -5|
|   -4|
|   -3|
+-----+
// throws an AnalysisException: Cannot up cast `id` from bigint to smallint as 
it may truncate
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
  .map(s => s - 1)
  .show()
{code}

  was:
The logic in {{Cast$mayTruncate}} is broken for bytes.

Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while 
{{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently, 
{{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will behave 
differently.

Potentially, this bug can lead to silently corrupting someone's data.


> Fix Cast$mayTruncate for bytes
> ------------------------------
>
>                 Key: SPARK-26706
>                 URL: https://issues.apache.org/jira/browse/SPARK-26706
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.3, 2.4.1, 3.0.0
>            Reporter: Anton Okolnychyi
>            Priority: Major
>
> The logic in {{Cast$mayTruncate}} is broken for bytes.
> Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while 
> {{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently, 
> {{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will 
> behave differently.
> Potentially, this bug can lead to silently corrupting someone's data.
> {code}
> // executes silently even though Long is converted into Byte
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
>   .map(b => b - 1)
>   .show()
> +-----+
> |value|
> +-----+
> |  -12|
> |  -11|
> |  -10|
> |   -9|
> |   -8|
> |   -7|
> |   -6|
> |   -5|
> |   -4|
> |   -3|
> +-----+
> // throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
> as it may truncate
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
>   .map(s => s - 1)
>   .show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to