Hi all,

I am using Spark SQL and I have a table stored in a Dataframe that I am
trying to re-structure. I have an approach that works locally but when I
try to run the same command on an AWS EC2 instance I get an error reporting
that I have an 'unresolved operator'

Basically I have data that looks like:

userId    someString      varA
   1      "example1"     [0,2,5]
   2      "example2"     [1,20,5]

and I use an 'explode' command in an sqlContext on varA. When I run this
locally things return correctly, but on AWS they fail.

I can reproduce this with the following commands:

val data = List(("1", "example1", Array(0,2,5)), ("2", "example2",
Array(1,20,5)))
val distData = sc.parallelize(data)
val distTable = distData.toDF("userId", "someString", "varA")
distTable.registerTempTable("distTable_tmp")
val temp1 = sqlContext.sql("select userId, someString, varA from
distTable_tmp")
val temp2 = sqlContext.sql("select userId, someString, explode(varA) as
varA from distTable_tmp")

Locally, temp1.show() and temp2.show() return what I'd expect, namely:

scala> temp1.show()
+------+----------+----------+
|userId|someString|      varA|
+------+----------+----------+
|     1|  example1| [0, 2, 5]|
|     2|  example2|[1, 20, 5]|
+------+----------+----------+

scala> temp2.show()
+------+----------+----+
|userId|someString|varA|
+------+----------+----+
|     1|  example1|   0|
|     1|  example1|   2|
|     1|  example1|   5|
|     2|  example2|   1|
|     2|  example2|  20|
|     2|  example2|   5|
+------+----------+----+

but on AWS the temp1 sqlContext command works fine, but temp2 fails with
the message:

scala> val temp2 = sqlContext.sql("select userId, someString, explode(varA)
as varA from distTable_tmp")
15/11/05 22:46:49 INFO parse.ParseDriver: Parsing command: select userId,
someString, explode(varA) as varA from distTable_tmp
15/11/05 22:46:49 INFO parse.ParseDriver: Parse Completed
org.apache.spark.sql.AnalysisException: unresolved operator 'Project
[userId#3,someString#4,HiveGenericUdtf#org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode(varA#5)
AS varA#6];
...

I am just opening the Spark Scala shell via './bin/spark-shell' locally and
'MASTER=yarn-client /home/hadoop/spark/bin/spark-shell' on AWS - I didn't
think to use anything except the default sqlContext that seems to be loaded
for me. The Spark versions are 1.5.1 (local) and 1.3.1 (AWS).

It was suggested me to me check what I get when I execute
sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] on AWS and
this returns:

'scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext]
res2: Boolean = true.'

Locally I didn't compile Spark with Hive, unfortunately, so this command on
my local installation returns:

'error: object hive is not a member of package org.apache.spark.sql'

Many thanks,
Anthony

Reply via email to