HiveContext on Spark 1.6 Linkage Error:ClassCastException
Hello guys, hope all of you are ok. I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and I placed the hive-site.xml in the classPath, so doing I use the Hive instance running on my cluster instead of creating a local metastore and a local warehouse. So far so good, in this scenario select * and insert into query work ok, but the problem arise when trying to drop table and/or create new ones. Provided that is not a permission problem, my issue is: ClassCastException: attempting to cast jar file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class to jar cast jar file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class. As you can see, it is attempting to cast the same jar, and it throws the exception, I think because the same jar has been loaded before from a different classloader, in fact one is loaded by org.apache.spark.sql.hive.client.IsolatedClientLoader and the other one by sun.misc.Launcher.$AppClassLoader. Any suggestion to fix this issue? The same happens when building the jar and running it with spark-submit (yarn RM). Cheers, best CONFIDENTIALITY WARNING. This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.
HiveContext on Spark 1.6 Linkage Error:ClassCastException
Hello guys, hope all of you are ok. I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and I placed the hive-site.xml in the classPath, so doing I use the Hive instance running on my cluster instead of creating a local metastore and a local warehouse. So far so good, in this scenario select * and insert into query work ok, but the problem arise when trying to drop table and/or create new ones. Provided that is not a permission problem, my issue is: ClassCastException: attempting to cast jar file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class to jar cast jar file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class. As you can see, it is attempting to cast the same jar, and it throws the exception, I think because the same jar has been loaded before from a different classloader, in fact one is loaded by org.apache.spark.sql.hive.client.IsolatedClientLoader and the other one by sun.misc.Launcher.$AppClassLoader. Any suggestion to fix this issue? The same happens when building the jar and running it with spark-submit (yarn RM). Cheers, best CONFIDENTIALITY WARNING. This message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. everis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute any portion of this E-mail.
Re: Creating HiveContext withing Spark streaming
lowMultipleContexts", "true"). >> set("spark.hadoop.validateOutputSpecs", "false") >> // change the values accordingly. >> sparkConf.set("sparkDefaultParllelism", >> sparkDefaultParallelismValue) >> sparkConf.set("sparkSerializer", sparkSerializerValue) >> sparkConf.set("sparkNetworkTimeOut", >> sparkNetworkTimeOutValue) >> // If you want to see more details of batches please >> increase the value >> // and that will be shown UI. >> sparkConf.set("sparkStreamingUiRetainedBatches", >>sparkStreamingUiRetainedBatchesValue) >> sparkConf.set("sparkWorkerUiRetainedDrivers", >>sparkWorkerUiRetainedDriversValue) >> sparkConf.set("sparkWorkerUiRetainedExecutors", >>sparkWorkerUiRetainedExecutorsValue) >> sparkConf.set("sparkWorkerUiRetainedStages", >>sparkWorkerUiRetainedStagesValue) >> sparkConf.set("sparkUiRetainedJobs", >> sparkUiRetainedJobsValue) >> sparkConf.set("enableHiveSupport",enableHiveSupportValue) >> sparkConf.set("spark.streaming.stopGracefullyOnShutdown","tr >> ue") >> sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", >> "true") >> >> sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite", >> "true") >> >> sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite", >> "true") >> var sqltext = "" >> val batchInterval = 2 >> val streamingContext = new StreamingContext(sparkConf, >> Seconds(batchInterval)) >> >> With the above settings, Spark streaming works fine. *However, after >> adding the first line below (in red)* >> >> *val sparkContext = new SparkContext(sparkConf)* >> val HiveContext = new HiveContext(streamingContext.sparkContext) >> >> I get the following errors: >> >> 16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job >> 1473339752000 ms.0 >> org.apache.spark.SparkException: Job aborted due to stage failure: Task >> 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage >> 0.0 (TID 7, 50.140.197.217): java.io.IOException: >> *org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of >> broadcast_0*at org.apache.spark.util.Utils$.t >> ryOrIOException(Utils.scala:1260) >> at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlo >> ck(TorrentBroadcast.scala:174) >> at org.apache.spark.broadcast.TorrentBroadcast._value$lzycomput >> e(TorrentBroadcast.scala:65) >> at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBr >> oadcast.scala:65) >> at org.apache.spark.broadcast.TorrentBroadcast.getValue(Torrent >> Broadcast.scala:89) >> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.sca >> la:67) >> at org.apache.spark.scheduler.Task.run(Task.scala:85) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor. >> scala:274) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.spark.SparkException: Failed to get >> broadcast_0_piece0 of broadcast_0 >> >> >> Hm any ideas? >> >> Thanks >> >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary
Re: Creating HiveContext withing Spark streaming
runTask(ResultTask. > scala:67) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:274) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.SparkException: Failed to get > broadcast_0_piece0 of broadcast_0 > > > Hm any ideas? > > Thanks > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 8 September 2016 at 12:28, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> >> Hi, >> >> This may not be feasible in Spark streaming. >> >> I am trying to create a HiveContext in Spark streaming within the >> streaming context >> >> // Create a local StreamingContext with two working thread and batch >> interval of 2 seconds. >> >> val sparkConf = new SparkConf(). >> setAppName(sparkAppName). >> set("spark.driver.allowMultipleContexts", "true"). >> set("spark.hadoop.validateOutputSpecs", "false") >> . >> >> Now try to create an sc >> >> val sc = new SparkContext(sparkConf) >> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> >> This is accepted but it creates two spark jobs >> >> >> [image: Inline images 1] >> >> And basically it goes to a waiting state >> >> Any ideas how one can create a HiveContext within Spark streaming? >> >> Thanks >> >> >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> > >
Re: Creating HiveContext withing Spark streaming
Ok I managed to sort that one out. This is what I am facing val sparkConf = new SparkConf(). setAppName(sparkAppName). set("spark.driver.allowMultipleContexts", "true"). set("spark.hadoop.validateOutputSpecs", "false") // change the values accordingly. sparkConf.set("sparkDefaultParllelism", sparkDefaultParallelismValue) sparkConf.set("sparkSerializer", sparkSerializerValue) sparkConf.set("sparkNetworkTimeOut", sparkNetworkTimeOutValue) // If you want to see more details of batches please increase the value // and that will be shown UI. sparkConf.set("sparkStreamingUiRetainedBatches", sparkStreamingUiRetainedBatchesValue) sparkConf.set("sparkWorkerUiRetainedDrivers", sparkWorkerUiRetainedDriversValue) sparkConf.set("sparkWorkerUiRetainedExecutors", sparkWorkerUiRetainedExecutorsValue) sparkConf.set("sparkWorkerUiRetainedStages", sparkWorkerUiRetainedStagesValue) sparkConf.set("sparkUiRetainedJobs", sparkUiRetainedJobsValue) sparkConf.set("enableHiveSupport",enableHiveSupportValue) sparkConf.set("spark.streaming.stopGracefullyOnShutdown","true") sparkConf.set("spark.streaming.receiver.writeAheadLog.enable", "true") sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite", "true") sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite", "true") var sqltext = "" val batchInterval = 2 val streamingContext = new StreamingContext(sparkConf, Seconds(batchInterval)) With the above settings, Spark streaming works fine. *However, after adding the first line below (in red)* *val sparkContext = new SparkContext(sparkConf)* val HiveContext = new HiveContext(streamingContext.sparkContext) I get the following errors: 16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job 1473339752000 ms.0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, 50.140.197.217): java.io.IOException: *org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0*at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1260) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 Hm any ideas? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 8 September 2016 at 12:28, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi, > > This may not be feasible in Spark streaming. > > I am trying to create a HiveContext in Spark streaming within the > streaming context > > // Create a local StreamingContext with two working thread and batch > interval of 2 seconds. > > val sparkConf = new SparkConf(). > setAppName(sparkAppName). > set("spark.driver.allowMultipleContexts", "true"). > set("spark.hadoop.validateOutputSpecs", "false") > ..... > > Now try to create an sc > > val sc = new
Creating HiveContext withing Spark streaming
Hi, This may not be feasible in Spark streaming. I am trying to create a HiveContext in Spark streaming within the streaming context // Create a local StreamingContext with two working thread and batch interval of 2 seconds. val sparkConf = new SparkConf(). setAppName(sparkAppName). set("spark.driver.allowMultipleContexts", "true"). set("spark.hadoop.validateOutputSpecs", "false") . Now try to create an sc val sc = new SparkContext(sparkConf) val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) This is accepted but it creates two spark jobs [image: Inline images 1] And basically it goes to a waiting state Any ideas how one can create a HiveContext within Spark streaming? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
HiveContext in spark
I Could not able to use Insert , update and delete command in HiveContext. i am using spark 1.6.1 version and hive 1.1.0 Please find the error below. scala> hc.sql("delete from trans_detail where counter=1"); 16/04/12 14:58:45 INFO ParseDriver: Parsing command: delete from trans_detail where counter=1 16/04/12 14:58:45 INFO ParseDriver: Parse Completed 16/04/12 14:58:45 INFO ParseDriver: Parsing command: delete from trans_detail where counter=1 16/04/12 14:58:45 INFO ParseDriver: Parse Completed 16/04/12 14:58:45 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:60409 in memory (size: 46.9 KB, free: 536.7 MB) 16/04/12 14:58:46 INFO ContextCleaner: Cleaned accumulator 3 16/04/12 14:58:46 INFO BlockManagerInfo: Removed broadcast_4_piece0 on localhost:60409 in memory (size: 3.6 KB, free: 536.7 MB) org.apache.spark.sql.AnalysisException: Unsupported language features in query: delete from trans_detail where counter=1 TOK_DELETE_FROM 1, 0,11, 13 TOK_TABNAME 1, 5,5, 13 trans_detail 1, 5,5, 13 TOK_WHERE 1, 7,11, 39 = 1, 9,11, 39 TOK_TABLE_OR_COL 1, 9,9, 32 counter 1, 9,9, 32 1 1, 11,11, 40 scala.NotImplementedError: No parse rules for TOK_DELETE_FROM: TOK_DELETE_FROM 1, 0,11, 13 TOK_TABNAME 1, 5,5, 13 trans_detail 1, 5,5, 13 TOK_WHERE 1, 7,11, 39 = 1, 9,11, 39 TOK_TABLE_OR_COL 1, 9,9, 32 counter 1, 9,9, 32 1 1, 11,11, 40 org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:1217) -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: Creating HiveContext in Spark-Shell fails
This sqlContext is one instance of hive context, do not be confused by the name. > On Feb 16, 2016, at 12:51, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > > Hi All, > > On creating HiveContext in spark-shell, fails with > > Caused by: ERROR XSDB6: Another instance of Derby may have already booted the > database /SPARK/metastore_db. > > Spark-Shell already has created metastore_db for SqlContext. > > Spark context available as sc. > SQL context available as sqlContext. > > But without HiveContext, i am able to query the data using SqlContext . > > scala> var df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("/SPARK/abc") > df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string] > > So is there any real need for HiveContext inside Spark Shell. Is everything > that can be done with HiveContext, achievable with SqlContext inside Spark > Shell. > > > > Thanks, > Prabhu Joseph > > > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Creating HiveContext in Spark-Shell fails
Thanks Mark, that answers my question. On Tue, Feb 16, 2016 at 10:55 AM, Mark Hamstra <m...@clearstorydata.com> wrote: > Welcome to > > __ > > / __/__ ___ _/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > >/___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > > /_/ > > > > Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_72) > > Type in expressions to have them evaluated. > > Type :help for more information. > > > scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] > > res0: Boolean = true > > > > On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com > > wrote: > >> Hi All, >> >> On creating HiveContext in spark-shell, fails with >> >> Caused by: ERROR XSDB6: Another instance of Derby may have already booted >> the database /SPARK/metastore_db. >> >> Spark-Shell already has created metastore_db for SqlContext. >> >> Spark context available as sc. >> SQL context available as sqlContext. >> >> But without HiveContext, i am able to query the data using SqlContext . >> >> scala> var df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load("/SPARK/abc") >> df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string] >> >> So is there any real need for HiveContext inside Spark Shell. Is >> everything that can be done with HiveContext, achievable with SqlContext >> inside Spark Shell. >> >> >> >> Thanks, >> Prabhu Joseph >> >> >> >> >> >
Re: Creating HiveContext in Spark-Shell fails
Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_72) Type in expressions to have them evaluated. Type :help for more information. scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext] res0: Boolean = true On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > Hi All, > > On creating HiveContext in spark-shell, fails with > > Caused by: ERROR XSDB6: Another instance of Derby may have already booted > the database /SPARK/metastore_db. > > Spark-Shell already has created metastore_db for SqlContext. > > Spark context available as sc. > SQL context available as sqlContext. > > But without HiveContext, i am able to query the data using SqlContext . > > scala> var df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load("/SPARK/abc") > df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string] > > So is there any real need for HiveContext inside Spark Shell. Is > everything that can be done with HiveContext, achievable with SqlContext > inside Spark Shell. > > > > Thanks, > Prabhu Joseph > > > > >
Creating HiveContext in Spark-Shell fails
Hi All, On creating HiveContext in spark-shell, fails with Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /SPARK/metastore_db. Spark-Shell already has created metastore_db for SqlContext. Spark context available as sc. SQL context available as sqlContext. But without HiveContext, i am able to query the data using SqlContext . scala> var df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/SPARK/abc") df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string] So is there any real need for HiveContext inside Spark Shell. Is everything that can be done with HiveContext, achievable with SqlContext inside Spark Shell. Thanks, Prabhu Joseph
Sharing HiveContext in Spark JobServer / getOrCreate
Hi I am using a shared sparkContext for all of my Spark jobs. Some of the jobs use HiveContext, but there isn't a getOrCreate method on HiveContext which will allow reuse of an existing HiveContext. Such a method exists on SQLContext only (def getOrCreate(sparkContext: SparkContext): SQLContext). Is there any reason that a HiveContext cannot be shared amongst multiple threads within the same Spark driver process? In addition I cannot seem to be able to cast a HiveContext to a SQLContext, but this works fine in the spark shell, I am doing something wrong here? scala> sqlContext res19: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@383b3357 scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> SQLContext.getOrCreate(sc) res18: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@383b3357 Regards Deenar
Re: Sharing HiveContext in Spark JobServer / getOrCreate
Have you noticed the following method of HiveContext ? * Returns a new HiveContext as new session, which will have separated SQLConf, UDF/UDAF, * temporary tables and SessionState, but sharing the same CacheManager, IsolatedClientLoader * and Hive client (both of execution and metadata) with existing HiveContext. */ override def newSession(): HiveContext = { Cheers On Mon, Jan 25, 2016 at 7:22 AM, Deenar Toraskarwrote: > Hi > > I am using a shared sparkContext for all of my Spark jobs. Some of the > jobs use HiveContext, but there isn't a getOrCreate method on HiveContext > which will allow reuse of an existing HiveContext. Such a method exists on > SQLContext only (def getOrCreate(sparkContext: SparkContext): SQLContext). > > Is there any reason that a HiveContext cannot be shared amongst multiple > threads within the same Spark driver process? > > In addition I cannot seem to be able to cast a HiveContext to a > SQLContext, but this works fine in the spark shell, I am doing something > wrong here? > > scala> sqlContext > > res19: org.apache.spark.sql.SQLContext = > org.apache.spark.sql.hive.HiveContext@383b3357 > > scala> import org.apache.spark.sql.SQLContext > > import org.apache.spark.sql.SQLContext > > scala> SQLContext.getOrCreate(sc) > > res18: org.apache.spark.sql.SQLContext = > org.apache.spark.sql.hive.HiveContext@383b3357 > > > > Regards > Deenar >
Re: Sharing HiveContext in Spark JobServer / getOrCreate
On 25 January 2016 at 21:09, Deenar Toraskar < deenar.toras...@thinkreactive.co.uk> wrote: > No I hadn't. This is useful, but in some cases we do want to share the > same temporary tables between jobs so really wanted a getOrCreate > equivalent on HIveContext. > > Deenar > > > > On 25 January 2016 at 18:10, Ted Yuwrote: > >> Have you noticed the following method of HiveContext ? >> >>* Returns a new HiveContext as new session, which will have separated >> SQLConf, UDF/UDAF, >>* temporary tables and SessionState, but sharing the same >> CacheManager, IsolatedClientLoader >>* and Hive client (both of execution and metadata) with existing >> HiveContext. >>*/ >> override def newSession(): HiveContext = { >> >> Cheers >> >> On Mon, Jan 25, 2016 at 7:22 AM, Deenar Toraskar < >> deenar.toras...@gmail.com> wrote: >> >>> Hi >>> >>> I am using a shared sparkContext for all of my Spark jobs. Some of the >>> jobs use HiveContext, but there isn't a getOrCreate method on HiveContext >>> which will allow reuse of an existing HiveContext. Such a method exists on >>> SQLContext only (def getOrCreate(sparkContext: SparkContext): >>> SQLContext). >>> >>> Is there any reason that a HiveContext cannot be shared amongst multiple >>> threads within the same Spark driver process? >>> >>> In addition I cannot seem to be able to cast a HiveContext to a >>> SQLContext, but this works fine in the spark shell, I am doing something >>> wrong here? >>> >>> scala> sqlContext >>> >>> res19: org.apache.spark.sql.SQLContext = >>> org.apache.spark.sql.hive.HiveContext@383b3357 >>> >>> scala> import org.apache.spark.sql.SQLContext >>> >>> import org.apache.spark.sql.SQLContext >>> >>> scala> SQLContext.getOrCreate(sc) >>> >>> res18: org.apache.spark.sql.SQLContext = >>> org.apache.spark.sql.hive.HiveContext@383b3357 >>> >>> >>> >>> Regards >>> Deenar >>> >> >> >
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
I got a similar problem.I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster(yarn-cluster); If you find the solution, please let us know. Regards,Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
That is a much better solution than how I resolved it. I got around it by placing comma separated jar paths for all the hive related jars in --jars clause. I will try your solution. Thanks for sharing it. On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam misla...@yahoo.com wrote: I got a similar problem. I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster( yarn-cluster); If you find the solution, please let us know. Regards, Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
HiveContext test, Spark Context did not initialize after waiting 10000ms
I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) You're missing /* at the end of your classpath entries. Also, since you're on CDH 5.2, you'll probably need to filter out the guava jar from Hive's lib directory, otherwise things might break. So things will get a little more complicated. With CDH 5.3 you shouldn't need to filter out the guava jar. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to use HiveContext in spark-shell
Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to use HiveContext in spark-shell
can you be more specific what version of spark, hive, hadoop, etc... what are you trying to do? what are the issues you are seeing? J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Thu, Nov 6, 2014 at 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to use HiveContext in spark-shell
What version of Spark are you using? Did you compile your Spark version and if so, what compile options did you use? On 11/6/14, 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveCont ext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Unable to use HiveContext in spark-shell
HiveContext in spark-shell Date: Thu, 6 Nov 2014 17:38:51 + What version of Spark are you using? Did you compile your Spark version and if so, what compile options did you use? On 11/6/14, 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveCont ext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to use HiveContext in spark-shell
Those are the same options I used, except I had —tgz to package it and I built off of the master branch. Unfortunately, my only guess is that these errors stem from your build environment. In your spark assembly, do you have any classes which belong to the org.apache.hadoop.hive package? From: Tridib Samanta tridib.sama...@live.commailto:tridib.sama...@live.com Date: Thursday, November 6, 2014 at 9:49 AM To: Terry Siu terry@smartfocus.commailto:terry@smartfocus.com, u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org Subject: RE: Unable to use HiveContext in spark-shell I am using spark 1.1.0. I built it using: ./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests My ultimate goal is to execute a query on parquet file with nested structure and cast a date string to Date. This is required to calculate the age of Person entity. but I am even unable to pass this line: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) I made sure that org.apache.hadoop package is in the spark assembly jar. Re-attaching the stack trace for quick reference. scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) error: bad symbolic reference. A signature in HiveContext.class refers to term hive in package org.apache.hadoop which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. error: while compiling: console during phase: erasure library version: version 2.10.4 compiler version: version 2.10.4 reconstructed args: last tree to typer: Apply(value $outer) symbol: value $outer (flags: method synthetic stable expandedname triedcooking) symbol definition: val $outer(): $iwC.$iwC.type tpe: $iwC.$iwC.type symbol owners: value $outer - class $iwC - class $iwC - class $iwC - class $read - package $line5 context owners: class $iwC - class $iwC - class $iwC - class $iwC - class $read - package $line5 == Enclosing template or block == ClassDef( // class $iwC extends Serializable 0 $iwC [] Template( // val local $iwC: notype, tree.tpe=$iwC java.lang.Object, scala.Serializable // parents ValDef( private _ tpt empty ) // 5 statements DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC method triedcooking init [] // 1 parameter list ValDef( // $outer: $iwC.$iwC.$iwC.type $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) tpt // tree.tpe=$iwC Block( // tree.tpe=Unit Apply( // def init(): Object in class Object, tree.tpe=Object $iwC.super.init // def init(): Object in class Object, tree.tpe=()Object Nil ) () ) ) ValDef( // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext private local triedcooking sqlContext tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext Apply( // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext new org.apache.spark.sql.hive.HiveContext.init // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=(sc: org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext Apply( // val sc(): org.apache.spark.SparkContext, tree.tpe=org.apache.spark.SparkContext $iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc // val sc(): org.apache.spark.SparkContext, tree.tpe=()org.apache.spark.SparkContext Nil ) ) ) DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext method stable accessor sqlContext [] List(Nil) tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext $iwC.this.sqlContext // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext ) ValDef( // protected val $outer: $iwC.$iwC.$iwC.type protected synthetic paramaccessor triedcooking $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) DefDef( // val $outer(): $iwC.$iwC.$iwC.type method synthetic stable expandedname triedcooking $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer [] List(Nil) tpt // tree.tpe=Any $iwC.this.$outer // protected val $outer: $iwC.$iwC.$iwC.type, tree.tpe=$iwC.$iwC.$iwC.type ) ) ) == Expanded type of tree == ThisType(class $iwC) uncaught exception during compilation: scala.reflect.internal.Types$TypeError scala.reflect.internal.Types
Re: Unable to use HiveContext in spark-shell
Yes. I have org.apache.hadoop.hive package in spark assembly. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18322.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to use HiveContext in spark-shell
I built spark-1.1.0 in a new fresh machine. This issue is gone! Thank you all for your help. Thanks Regards Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18324.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Unable to use HiveContext in spark-shell
I am connecting to a remote master using spark shell. Then I am getting following error while trying to instantiate HiveContext. scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) error: bad symbolic reference. A signature in HiveContext.class refers to term hive in package org.apache.hadoop which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. error: while compiling: console during phase: erasure library version: version 2.10.4 compiler version: version 2.10.4 reconstructed args: last tree to typer: Apply(value $outer) symbol: value $outer (flags: method synthetic stable expandedname triedcooking) symbol definition: val $outer(): $iwC.$iwC.type tpe: $iwC.$iwC.type symbol owners: value $outer - class $iwC - class $iwC - class $iwC - class $read - package $line5 context owners: class $iwC - class $iwC - class $iwC - class $iwC - class $read - package $line5 == Enclosing template or block == ClassDef( // class $iwC extends Serializable 0 $iwC [] Template( // val local $iwC: notype, tree.tpe=$iwC java.lang.Object, scala.Serializable // parents ValDef( private _ tpt empty ) // 5 statements DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC method triedcooking init [] // 1 parameter list ValDef( // $outer: $iwC.$iwC.$iwC.type $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) tpt // tree.tpe=$iwC Block( // tree.tpe=Unit Apply( // def init(): Object in class Object, tree.tpe=Object $iwC.super.init // def init(): Object in class Object, tree.tpe=()Object Nil ) () ) ) ValDef( // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext private local triedcooking sqlContext tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext Apply( // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext new org.apache.spark.sql.hive.HiveContext.init // def init(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=(sc: org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext Apply( // val sc(): org.apache.spark.SparkContext, tree.tpe=org.apache.spark.SparkContext $iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc // val sc(): org.apache.spark.SparkContext, tree.tpe=()org.apache.spark.SparkContext Nil ) ) ) DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext method stable accessor sqlContext [] List(Nil) tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext $iwC.this.sqlContext // private[this] val sqlContext: org.apache.spark.sql.hive.HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext ) ValDef( // protected val $outer: $iwC.$iwC.$iwC.type protected synthetic paramaccessor triedcooking $outer tpt // tree.tpe=$iwC.$iwC.$iwC.type empty ) DefDef( // val $outer(): $iwC.$iwC.$iwC.type method synthetic stable expandedname triedcooking $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer [] List(Nil) tpt // tree.tpe=Any $iwC.this.$outer // protected val $outer: $iwC.$iwC.$iwC.type, tree.tpe=$iwC.$iwC.$iwC.type ) ) ) == Expanded type of tree == ThisType(class $iwC) uncaught exception during compilation: scala.reflect.internal.Types$TypeError scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in HiveContext.class refers to term conf in value org.apache.hadoop.hive which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling HiveContext.class. That entry seems to have slain the compiler. Shall I replay your session? I can re-run each line except the last one. [y/n] Thanks Tridib -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS
Hi, When I try to use HiveContext in Spark shell on AWS, I got the error java.lang.IllegalAccessError: tried to access method com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap. I follow the steps below to compile and install Spark(ps. I test 1.0.0, 1.0.1 and 1.0.2). Step 1: ./make-distribution.sh --hadoop 2.4.0 --with-hive --tgz Success! Step 2: elastic-mapreduce --create --alive --name Spark Test --ami-version 3.1.0 --instance-type m3.xlarge --instance-count 2 Hadoop version: 2.4.0 Hive: 0.11.0 Success ! 3. wget --no-check-certificate https://s3.amazonaws.com/spark-related-packages/scala-2.10.3.tgz 4. install and configure Hive, Spark Scala # edit hive-site.xml add account and passport for Amazon RDS to retrive remote metadata of hive. Successfully connect to RDS! # edit bashrc vim /home/hadoop/.bashrc export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3 # create_spark_env vim /home/hadoop/spark/conf/spark-env.sh export SPARK_MASTER_IP=10.218.180.250 export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3 export SPARK_LOCAL_DIRS=/mnt/spark/ export SPARK_CLASSPATH=/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar export SPARK_DAEMON_JAVA_OPTS=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps # copy core site to spark and shark cp /home/hadoop/conf/core-site.xml /home/hadoop/spark/conf/ 5.Start spark /home/hadoop/spark/sbin/start-master.sh spark can read and write data in Amazon S3. 6. ./spark/bin/spark-shell --master spark://10.218.180.250:7077 --driver-class-path spark/lib/mysql-connector-java-5.1.26-bin.jar 7. error log scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/08/07 09:38:39 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@45be296f scala import hiveContext._ import hiveContext._ scala hql(show tables) 14/08/07 09:38:48 INFO parse.ParseDriver: Parsing command: show tables 14/08/07 09:38:48 INFO parse.ParseDriver: Parse Completed 14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations 14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences 14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for batch Check Analysis 14/08/07 09:38:48 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange 14/08/07 09:38:48 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions 14/08/07 09:38:49 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=Driver.run 14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=TimeToSubmit 14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=compile 14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=parse 14/08/07 09:38:49 INFO parse.ParseDriver: Parsing command: show tables 14/08/07 09:38:49 INFO parse.ParseDriver: Parse Completed 14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=parse start=1407404329052 end=1407404329052 duration=0 14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=semanticAnalyze 14/08/07 09:38:49 INFO ql.Driver: Semantic Analysis Completed 14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=semanticAnalyze start=1407404329052 end=1407404329189 duration=137 14/08/07 09:38:49 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/08/07 09:38:49 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/08/07 09:38:49 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/08/07 09:38:49 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=compile start
Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS
Hey Zhun, Thanks for the detailed problem description. Please see my comments inlined below. On Thu, Aug 7, 2014 at 6:18 PM, Zhun Shen shenzhunal...@gmail.com wrote: Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap; from class com.jolbox.bonecp.BoneCPDataSource This line indicates that accessing MapMaker.makeComputingMap via Java reflection fails. The version of Guava we used in Spark SQL (as a transitive dependency) is 14.0.1. In this version, MapMaker.makeComputingMap is still public https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/MapMaker.java?name=v14.0.1#581. But in newer versions (say 15.0), it’s no longer public https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/MapMaker.java?name=v15.0 . So my guess is that, a newer version of the Guava library in your classpath shadows the version Spark SQL uses somehow. A quick and dirty fix to see whether this is true is try putting Guava 14.0.1 jar file at the beginning of your classpath and see whether things work. at com.jolbox.bonecp.BoneCPDataSource.init(BoneCPDataSource.java:64) at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:73) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.init(ConnectionFactoryImpl.java:82) ... 119 more
Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS
Hi Cheng, I replaced Guava 15.0 with Guava 14.0.1 in my spark classpath, the problem was solved. So your method is correct. It proved that this issue was caused by AWS EMR (ami-version 3.1.0) libs which include Guava 15.0. Many thanks and see you in the first Spark User Beijing Meetup tomorrow. -- Zhun Shen Data Mining at LightnInTheBox.com Email: shenzhunal...@gmail.com | shenz...@yahoo.com Phone: 186 0627 7769 GitHub: https://github.com/shenzhun LinkedIn: http://www.linkedin.com/in/shenzhun On August 7, 2014 at 6:57:06 PM, Cheng Lian (lian.cs@gmail.com) wrote: Hey Zhun, Thanks for the detailed problem description. Please see my comments inlined below. On Thu, Aug 7, 2014 at 6:18 PM, Zhun Shen shenzhunal...@gmail.com wrote: Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap; from class com.jolbox.bonecp.BoneCPDataSource This line indicates that accessing MapMaker.makeComputingMap via Java reflection fails. The version of Guava we used in Spark SQL (as a transitive dependency) is 14.0.1. In this version, MapMaker.makeComputingMap is still public. But in newer versions (say 15.0), it’s no longer public. So my guess is that, a newer version of the Guava library in your classpath shadows the version Spark SQL uses somehow. A quick and dirty fix to see whether this is true is try putting Guava 14.0.1 jar file at the beginning of your classpath and see whether things work. at com.jolbox.bonecp.BoneCPDataSource.init(BoneCPDataSource.java:64) at org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:73) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.init(ConnectionFactoryImpl.java:82) ... 119 more