RE: spark sql: join sql fails after sqlCtx.cacheTable()
Using Hivecontext solved it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p21807.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: spark sql: join sql fails after sqlCtx.cacheTable()
I am getting exception at sparksheel at the following line: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) error: bad symbolic reference. A signature in HiveContext.class refers to term hive in package org.apache.hadoop which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. error: while compiling: console during phase: erasure library version: version 2.10.4 compiler version: version 2.10.4 reconstructed args: last tree to typer: Apply(value $outer) symbol: value $outer (flags: method synthetic stable expandedname triedcooking) symbol definition: val $outer(): $iwC.$iwC.type tpe: $iwC.$iwC.type symbol owners: value $outer - class $iwC - class $iwC - class $iwC - class $read - package $line5 context owners: class $iwC - class $iwC - class $iwC - class $iwC - class $read - package $line5 == Enclosing template or block == ClassDef( // class $iwC extends Serializable 0 $iwC [] Template( // val local $iwC: notype, tree.tpe=$iwC java.lang.Object, scala.Serializable // parents ValDef( private _ tpt empty ) // 5 statements DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC method triedcooking init [] // 1 parameter list ValDef( // $outer: $iwC.$iwC.$iwC.type param $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) tpt // tree.tpe=$iwC Block( // tree.tpe=Unit Apply( // def init(): Object in class Object, tree.tpe=Object $iwC.super.init // def init(): Object in class Object, tree.tpe=()Object Nil ) () ) ) ValDef( // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext private local triedcooking sqlContext tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext Apply( // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext new org.apache.spark.sql.hive.HiveContext.init // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=(sc: org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext Apply( // val sc(): org.apache.spark.SparkContext, tree.tpe=org.apache.spark.SparkContext $iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc // val sc(): org.apache.spark.SparkContext, tree.tpe=()org.apache.spark.SparkContext Nil ) ) ) DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext method stable accessor sqlContext [] List(Nil) tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext $iwC.this.sqlContext // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext ) ValDef( // protected val $outer: $iwC.$iwC.$iwC.type protected synthetic paramaccessor triedcooking $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) DefDef( // val $outer(): $iwC.$iwC.$iwC.type method synthetic stable expandedname triedcooking $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer [] List(Nil) tpt // tree.tpe=Any $iwC.this.$outer // protected val $outer: $iwC.$iwC.$iwC.type, tree.tpe=$iwC.$iwC.$iwC.type ) ) ) == Expanded type of tree == ThisType(class $iwC) uncaught exception during compilation: scala.reflect.internal.Types$TypeError scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in HiveContext.class refers to term conf in value org.apache.hadoop.hive which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. That entry seems to have slain the compiler. Shall I replay your session? I can re-run each line except the last one. Thanks Tridib Date: Tue, 21 Oct 2014 09:39:49 -0700 Subject: Re: spark sql: join sql fails after sqlCtx.cacheTable() From: ri...@infoobjects.com To: tridib.sama...@live.com CC: u...@spark.incubator.apache.org Hi Tridib, I changed SQLContext to HiveContext and it started working. These are steps I used. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)val person = sqlContext.jsonFile(json/person.json)person.printSchema()person.registerTempTable(person)val address = sqlContext.jsonFile(json/address.json)address.printSchema()address.registerTempTable(address)sqlContext.cacheTable(person)sqlContext.cacheTable(address)val rs2 = sqlContext.sql(select p.id,p.name,a.city from person
Re: spark sql: join sql fails after sqlCtx.cacheTable()
val sqlContext = new org.apache.spark.sql.SQLContext(sc) val personPath = /hdd/spark/person.json val person = sqlContext.jsonFile(personPath) person.printSchema() person.registerTempTable(person) val addressPath = /hdd/spark/address.json val address = sqlContext.jsonFile(addressPath) address.printSchema() address.registerTempTable(address) sqlContext.cacheTable(person) sqlContext.cacheTable(address) val rs2 = sqlContext.sql(SELECT p.id, p.name, a.city FROM person p, address a where p.id = a.id limit 10).collect.foreach(println) person.json {id:1,name:Mr. X} address.json {city:Earth,id:1} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p16914.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql: join sql fails after sqlCtx.cacheTable()
Hi Tridib, I changed SQLContext to HiveContext and it started working. These are steps I used. val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val person = sqlContext.jsonFile(json/person.json) person.printSchema() person.registerTempTable(person) val address = sqlContext.jsonFile(json/address.json) address.printSchema() address.registerTempTable(address) sqlContext.cacheTable(person) sqlContext.cacheTable(address) val rs2 = sqlContext.sql(select p.id,p.name,a.city from person p join address a on (p.id = a.id)).collect.foreach(println) Rishi@InfoObjects *Pure-play Big Data Consulting* On Tue, Oct 21, 2014 at 5:47 AM, tridib tridib.sama...@live.com wrote: val sqlContext = new org.apache.spark.sql.SQLContext(sc) val personPath = /hdd/spark/person.json val person = sqlContext.jsonFile(personPath) person.printSchema() person.registerTempTable(person) val addressPath = /hdd/spark/address.json val address = sqlContext.jsonFile(addressPath) address.printSchema() address.registerTempTable(address) sqlContext.cacheTable(person) sqlContext.cacheTable(address) val rs2 = sqlContext.sql(SELECT p.id, p.name, a.city FROM person p, address a where p.id = a.id limit 10).collect.foreach(println) person.json {id:1,name:Mr. X} address.json {city:Earth,id:1} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p16914.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql: join sql fails after sqlCtx.cacheTable()
Hmm... I thought HiveContext will only worki if Hive is present. I am curious to know when to use HiveContext and when to use SqlContext. Thanks Regards Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p16924.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql: join sql fails after sqlCtx.cacheTable()
Hmm... I thought HiveContext will only worki if Hive is present. I am curious to know when to use HiveContext and when to use SqlContext. http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started TLDR; Always use HiveContext if your application does not have a dependency conflict with Hive jars. :)
Re: spark sql: join sql fails after sqlCtx.cacheTable()
Thank for pointing that out. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-join-sql-fails-after-sqlCtx-cacheTable-tp16893p16933.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org