[ https://issues.apache.org/jira/browse/SPARK-23498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382912#comment-16382912 ]
Dongjoon Hyun edited comment on SPARK-23498 at 3/2/18 12:12 AM: ---------------------------------------------------------------- [~KevinZwx]. Did you see HIVE-17186? Hive doesn't give you the correct result. For the following TPC-H query, it will exclude '0.07'. {code} select l_discount from lineitem where l_discount between 0.06 - 0.01 and 0.06 + 0.01 {code} I'm not disagreeing with this issue. I want to give you a well-known example where you cannot trust Hive result. was (Author: dongjoon): [~KevinZwx]. Did you see HIVE-17186? Hive doesn't give you the correct result. For the following TPC-H query, it will exclude '0.07'. {code} select l_discount from lineitem where l_discount between 0.06 - 0.01 and 0.06 + 0.01 {code} > Accuracy problem in comparison with string and integer > ------------------------------------------------------ > > Key: SPARK-23498 > URL: https://issues.apache.org/jira/browse/SPARK-23498 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.2.1, 2.3.0 > Reporter: Kevin Zhang > Priority: Major > > While comparing a string column with integer value, spark sql will > automatically cast the string operant to int, the following sql will return > true in hive but false in spark > > {code:java} > select '1000.1'>1000 > {code} > > from the physical plan we can see the string operant was cast to int which > caused the accuracy loss > {code:java} > *Project [false AS (CAST(1000.1 AS INT) > 1000)#4] > +- Scan OneRowRelation[] > {code} > To solve it, using a wider common type like double to cast both sides of > operant of a binary operator may be safe. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org