Hi all, I am using Spark SQL and I have a table stored in a Dataframe that I am trying to re-structure. I have an approach that works locally but when I try to run the same command on an AWS EC2 instance I get an error reporting that I have an 'unresolved operator'
Basically I have data that looks like: userId someString varA 1 "example1" [0,2,5] 2 "example2" [1,20,5] and I use an 'explode' command in an sqlContext on varA. When I run this locally things return correctly, but on AWS they fail. I can reproduce this with the following commands: val data = List(("1", "example1", Array(0,2,5)), ("2", "example2", Array(1,20,5))) val distData = sc.parallelize(data) val distTable = distData.toDF("userId", "someString", "varA") distTable.registerTempTable("distTable_tmp") val temp1 = sqlContext.sql("select userId, someString, varA from distTable_tmp") val temp2 = sqlContext.sql("select userId, someString, explode(varA) as varA from distTable_tmp") Locally, temp1.show() and temp2.show() return what I'd expect, namely: scala> temp1.show() +------+----------+----------+ |userId|someString| varA| +------+----------+----------+ | 1| example1| [0, 2, 5]| | 2| example2|[1, 20, 5]| +------+----------+----------+ scala> temp2.show() +------+----------+----+ |userId|someString|varA| +------+----------+----+ | 1| example1| 0| | 1| example1| 2| | 1| example1| 5| | 2| example2| 1| | 2| example2| 20| | 2| example2| 5| +------+----------+----+ but on AWS the temp1 sqlContext command works fine, but temp2 fails with the message: scala> val temp2 = sqlContext.sql("select userId, someString, explode(varA) as varA from distTable_tmp") 15/11/05 22:46:49 INFO parse.ParseDriver: Parsing command: select userId, someString, explode(varA) as varA from distTable_tmp 15/11/05 22:46:49 INFO parse.ParseDriver: Parse Completed org.apache.spark.sql.AnalysisException: unresolved operator 'Project [userId#3,someString#4,HiveGenericUdtf#org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode(varA#5) AS varA#6]; ... I am just opening the Spark Scala shell via './bin/spark-shell' locally and 'MASTER=yarn-client /home/hadoop/spark/bin/spark-shell' on AWS - I didn't think to use anything except the default sqlContext that seems to be loaded for me. The Spark versions are 1.5.1 (local) and 1.3.1 (AWS). It was suggested me to me check what I get when I execute sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] on AWS and this returns: 'scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] res2: Boolean = true.' Locally I didn't compile Spark with Hive, unfortunately, so this command on my local installation returns: 'error: object hive is not a member of package org.apache.spark.sql' Many thanks, Anthony