[ https://issues.apache.org/jira/browse/SPARK-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai resolved SPARK-6201. ----------------------------- Resolution: Fixed Fix Version/s: 1.4.0 > INSET should coerce types > ------------------------- > > Key: SPARK-6201 > URL: https://issues.apache.org/jira/browse/SPARK-6201 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.2.1, 1.3.0 > Reporter: Jianshi Huang > Assignee: Adrian Wang > Fix For: 1.4.0 > > > Suppose we have the following table: > {code} > sqlc.jsonRDD(sc.parallelize(Seq("{\"a\": \"1\"}}", "{\"a\": \"2\"}}", > "{\"a\": \"3\"}}"))).registerTempTable("d") > {code} > The schema is > {noformat} > root > |-- a: string (nullable = true) > {noformat} > Then, > {code} > sql("select * from d where (d.a = 1 or d.a = 2)").collect > => > Array([1], [2]) > {code} > where d.a and constants 1,2 will be casted to Double first and do the > comparison as you can find it out in the plan: > {noformat} > Filter ((CAST(a#155, DoubleType) = CAST(1, DoubleType)) || (CAST(a#155, > DoubleType) = CAST(2, DoubleType))) > {noformat} > However, if I use > {code} > sql("select * from d where d.a in (1,2)").collect > {code} > The result is empty. > The physical plan shows it's using INSET: > {noformat} > == Physical Plan == > Filter a#155 INSET (1,2) > PhysicalRDD [a#155], MappedRDD[499] at map at JsonRDD.scala:47 > {noformat} > *It seems INSET implementation in SparkSQL doesn't coerce type implicitly, > where Hive does. We should make SparkSQL conform to Hive's behavior, even > though doing implicit coercion here is very confusing for comparing String > and Int.* > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org