[ https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750466#comment-16750466 ]
Dongjoon Hyun commented on SPARK-26706: --------------------------------------- I updated the affected versions, too. > Fix Cast$mayTruncate for bytes > ------------------------------ > > Key: SPARK-26706 > URL: https://issues.apache.org/jira/browse/SPARK-26706 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.1, 3.0.0 > Reporter: Anton Okolnychyi > Assignee: Anton Okolnychyi > Priority: Blocker > Labels: correctness > > The logic in {{Cast$mayTruncate}} is broken for bytes. > Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while > {{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently, > {{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will > behave differently. > Potentially, this bug can lead to silently corrupting someone's data. > {code} > // executes silently even though Long is converted into Byte > spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte] > .map(b => b - 1) > .show() > +-----+ > |value| > +-----+ > | -12| > | -11| > | -10| > | -9| > | -8| > | -7| > | -6| > | -5| > | -4| > | -3| > +-----+ > // throws an AnalysisException: Cannot up cast `id` from bigint to smallint > as it may truncate > spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short] > .map(s => s - 1) > .show() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org