[ https://issues.apache.org/jira/browse/SPARK-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353030#comment-14353030 ]
Cheng Lian edited comment on SPARK-6201 at 3/9/15 5:13 PM: ----------------------------------------------------------- Played Hive type implicit conversion a bit more and found that Hive actually converts integers to strings in your case: {code:sql} hive> create table t1 as select '1.00' as c1; hive> select * from t1 where c1 in (1.0); {code} If {{c1}} is converted to numeric, then {{1.00}} should appear in the result. However, the result set is empty. For expression {{"1.00" IN (1.0)}}, a {{GenericUDFIn}} instance is created and called with argument list {{("1.00", 1.0)}}. Then {{GenericUDFIn.initialize}} tries to convert all arguments into a common data type from left to right. Since double is allowed to be translated into string, {{1.0}} is converted into string {{"1.0"}}. References: # [Implicit type coercion support in existing database systems|http://chapeau.freevariable.com/2014/08/existing-system-coercion.html] by William Benton # [{{GenericUDFIn.initialize}}|https://github.com/apache/hive/blob/release-0.13.1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java#L84-L100] was (Author: lian cheng): Played Hive type implicit conversion a bit more and found that Hive actually converts integers to strings in your case: {code:sql} hive> create table t1 as select '1.00' as c1; hive> select * from t1 where c1 in (1.0); {code} If {{c1}} is converted to numeric, then the {{1.00}} should appear in the result. However, the result set is empty. For expression {{"1.00" IN (1.0)}}, a {{GenericUDFIn}} instance is created and called with an argument list {{("1.00", 1.0}}. Then {{GenericUDFIn}} tries to convert all arguments into a common data type from left to right. Since double is allowed to be translated into string, {{1.0}} is converted into string {{"1.0"}}. References: # [Implicit type coercion support in existing database systems|http://chapeau.freevariable.com/2014/08/existing-system-coercion.html] by William Benton # [{{GenericUDFIn.initialize}}|https://github.com/apache/hive/blob/release-0.13.1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java#L84-L100] > INSET should coerce types > ------------------------- > > Key: SPARK-6201 > URL: https://issues.apache.org/jira/browse/SPARK-6201 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Reporter: Jianshi Huang > > Suppose we have the following table: > {code} > sqlc.jsonRDD(sc.parallelize(Seq("{\"a\": \"1\"}}", "{\"a\": \"2\"}}", > "{\"a\": \"3\"}}"))).registerTempTable("d") > {code} > The schema is > {noformat} > root > |-- a: string (nullable = true) > {noformat} > Then, > {code} > sql("select * from d where (d.a = 1 or d.a = 2)").collect > => > Array([1], [2]) > {code} > where d.a and constants 1,2 will be casted to Double first and do the > comparison as you can find it out in the plan: > {noformat} > Filter ((CAST(a#155, DoubleType) = CAST(1, DoubleType)) || (CAST(a#155, > DoubleType) = CAST(2, DoubleType))) > {noformat} > However, if I use > {code} > sql("select * from d where d.a in (1,2)").collect > {code} > The result is empty. > The physical plan shows it's using INSET: > {noformat} > == Physical Plan == > Filter a#155 INSET (1,2) > PhysicalRDD [a#155], MappedRDD[499] at map at JsonRDD.scala:47 > {noformat} > *It seems INSET implementation in SparkSQL doesn't coerce type implicitly, > where Hive does. We should make SparkSQL conform to Hive's behavior, even > though doing implicit coercion here is very confusing for comparing String > and Int.* > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org