[ https://issues.apache.org/jira/browse/SPARK-17108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell updated SPARK-17108: -------------------------------------- Description: I have a Hive table with the following definition: {noformat} create table testforerror ( my_column MAP<BIGINT, ARRAY<String>> ); {noformat} The table has the following records {noformat} hive> select * from testforerror; OK {11001:["0034111000a4WaAAA2"]} {11001:["0034111000orWiWAAU"]} {11001:["","0034111000VgrHdAAJ"]} {11001:["0034110000cS4rDAAS"]} {12001:["0037110001a7ofsAAA"]} Time taken: 0.067 seconds, Fetched: 5 row(s) {noformat} I have a query which filters records with key of the my_column. The query is as follows {noformat} select * from testforerror where my_column[11001] is not null; {noformat} This query is executing fine from hive/beeline shell and producing the following records: {noformat} hive> select * from testforerror where my_column[11001] is not null; OK {11001:["0034111000a4WaAAA2"]} {11001:["0034111000orWiWAAU"]} {11001:["","0034111000VgrHdAAJ"]} {11001:["0034110000cS4rDAAS"]} Time taken: 2.224 seconds, Fetched: 4 row(s) {noformat} But however I get an error when trying to execute from spark sqlContext. The following is the error message: {noformat} scala> val errorquery = "select * from testforerror where my_column[11001] is not null" errorquery: String = select * from testforerror where my_column[11001] is not null scala> sqlContext.sql(errorquery).show() org.apache.spark.sql.AnalysisException: cannot resolve 'my_column[11001]' due to data type mismatch: argument 2 requires bigint type, however, '11001' is of int type.; line 1 pos 43 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:65) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) {noformat} was: I have a Hive table with the following definition: create table testforerror ( my_column MAP<BIGINT, ARRAY<String>> ); The table has the following records hive> select * from testforerror; OK {11001:["0034111000a4WaAAA2"]} {11001:["0034111000orWiWAAU"]} {11001:["","0034111000VgrHdAAJ"]} {11001:["0034110000cS4rDAAS"]} {12001:["0037110001a7ofsAAA"]} Time taken: 0.067 seconds, Fetched: 5 row(s) I have a query which filters records with key of the my_column. The query is as follows select * from testforerror where my_column[11001] is not null; This query is executing fine from hive/beeline shell and producing the following records: hive> select * from testforerror where my_column[11001] is not null; OK {11001:["0034111000a4WaAAA2"]} {11001:["0034111000orWiWAAU"]} {11001:["","0034111000VgrHdAAJ"]} {11001:["0034110000cS4rDAAS"]} Time taken: 2.224 seconds, Fetched: 4 row(s) But however I get an error when trying to execute from spark sqlContext. The following is the error message: scala> val errorquery = "select * from testforerror where my_column[11001] is not null" errorquery: String = select * from testforerror where my_column[11001] is not null scala> sqlContext.sql(errorquery).show() org.apache.spark.sql.AnalysisException: cannot resolve 'my_column[11001]' due to data type mismatch: argument 2 requires bigint type, however, '11001' is of int type.; line 1 pos 43 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:65) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > BIGINT and INT comparison failure in spark sql > ---------------------------------------------- > > Key: SPARK-17108 > URL: https://issues.apache.org/jira/browse/SPARK-17108 > Project: Spark > Issue Type: Bug > Reporter: Sai Krishna Kishore Beathanabhotla > > I have a Hive table with the following definition: > {noformat} > create table testforerror ( > my_column MAP<BIGINT, ARRAY<String>> > ); > {noformat} > The table has the following records > {noformat} > hive> select * from testforerror; > OK > {11001:["0034111000a4WaAAA2"]} > {11001:["0034111000orWiWAAU"]} > {11001:["","0034111000VgrHdAAJ"]} > {11001:["0034110000cS4rDAAS"]} > {12001:["0037110001a7ofsAAA"]} > Time taken: 0.067 seconds, Fetched: 5 row(s) > {noformat} > I have a query which filters records with key of the my_column. The query is > as follows > {noformat} > select * from testforerror where my_column[11001] is not null; > {noformat} > This query is executing fine from hive/beeline shell and producing the > following records: > {noformat} > hive> select * from testforerror where my_column[11001] is not null; > OK > {11001:["0034111000a4WaAAA2"]} > {11001:["0034111000orWiWAAU"]} > {11001:["","0034111000VgrHdAAJ"]} > {11001:["0034110000cS4rDAAS"]} > Time taken: 2.224 seconds, Fetched: 4 row(s) > {noformat} > But however I get an error when trying to execute from spark sqlContext. The > following is the error message: > {noformat} > scala> val errorquery = "select * from testforerror where my_column[11001] is > not null" > errorquery: String = select * from testforerror where my_column[11001] is not > null > scala> sqlContext.sql(errorquery).show() > org.apache.spark.sql.AnalysisException: cannot resolve 'my_column[11001]' due > to data type mismatch: argument 2 requires bigint type, however, '11001' is > of int type.; line 1 pos 43 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:65) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org