HiveContext on Spark 1.6 Linkage Error:ClassCastException

2017-02-14 Thread Enrico DUrso
Hello guys,
hope all of you are ok.
I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and 
I placed the hive-site.xml in the classPath, so doing I use the Hive instance 
running on my cluster instead
of creating a local metastore and a local warehouse.
So far so good, in this scenario select * and insert into query work ok, but 
the problem arise when trying to drop table and/or create new ones.
Provided that is not a permission problem, my issue is:
ClassCastException: attempting to cast jar 
file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class
 to jar cast jar 
file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class.

As you can see, it is attempting to cast the same jar, and it throws the 
exception, I think because the same jar has been loaded before from a different 
classloader, in fact one is loaded by
org.apache.spark.sql.hive.client.IsolatedClientLoader and the other one by 
sun.misc.Launcher.$AppClassLoader.

Any suggestion to fix this issue? The same happens when building the jar and 
running it with spark-submit (yarn RM).

Cheers,

best



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


HiveContext on Spark 1.6 Linkage Error:ClassCastException

2017-02-14 Thread Enrico DUrso


Hello guys,
hope all of you are ok.
I am trying to use HiveContext on Spark 1.6, I am developing using Eclipse and 
I placed the hive-site.xml in the classPath, so doing I use the Hive instance 
running on my cluster instead
of creating a local metastore and a local warehouse.
So far so good, in this scenario select * and insert into query work ok, but 
the problem arise when trying to drop table and/or create new ones.
Provided that is not a permission problem, my issue is:
ClassCastException: attempting to cast jar 
file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class
 to jar cast jar 
file://.../com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar!javax/ws/rs/ext/RunTimeDelegate.class.

As you can see, it is attempting to cast the same jar, and it throws the 
exception, I think because the same jar has been loaded before from a different 
classloader, in fact one is loaded by
org.apache.spark.sql.hive.client.IsolatedClientLoader and the other one by 
sun.misc.Launcher.$AppClassLoader.

Any suggestion to fix this issue? The same happens when building the jar and 
running it with spark-submit (yarn RM).

Cheers,

best



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
lowMultipleContexts", "true").
>>  set("spark.hadoop.validateOutputSpecs", "false")
>>  // change the values accordingly.
>>  sparkConf.set("sparkDefaultParllelism",
>> sparkDefaultParallelismValue)
>>  sparkConf.set("sparkSerializer", sparkSerializerValue)
>>  sparkConf.set("sparkNetworkTimeOut",
>> sparkNetworkTimeOutValue)
>>  // If you want to see more details of batches please
>> increase the value
>>  // and that will be shown UI.
>>  sparkConf.set("sparkStreamingUiRetainedBatches",
>>sparkStreamingUiRetainedBatchesValue)
>>  sparkConf.set("sparkWorkerUiRetainedDrivers",
>>sparkWorkerUiRetainedDriversValue)
>>  sparkConf.set("sparkWorkerUiRetainedExecutors",
>>sparkWorkerUiRetainedExecutorsValue)
>>  sparkConf.set("sparkWorkerUiRetainedStages",
>>sparkWorkerUiRetainedStagesValue)
>>  sparkConf.set("sparkUiRetainedJobs",
>> sparkUiRetainedJobsValue)
>>  sparkConf.set("enableHiveSupport",enableHiveSupportValue)
>>  sparkConf.set("spark.streaming.stopGracefullyOnShutdown","tr
>> ue")
>>  sparkConf.set("spark.streaming.receiver.writeAheadLog.enable",
>> "true")
>>  
>> sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite",
>> "true")
>>  
>> sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite",
>> "true")
>>  var sqltext = ""
>>  val batchInterval = 2
>>  val streamingContext = new StreamingContext(sparkConf,
>> Seconds(batchInterval))
>>
>> With the above settings,  Spark streaming works fine. *However, after
>> adding the first line below (in red)*
>>
>> *val sparkContext  = new SparkContext(sparkConf)*
>> val HiveContext = new HiveContext(streamingContext.sparkContext)
>>
>> I get the following errors:
>>
>> 16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job
>> 1473339752000 ms.0
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
>> 0.0 (TID 7, 50.140.197.217): java.io.IOException:
>> *org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
>> broadcast_0*at org.apache.spark.util.Utils$.t
>> ryOrIOException(Utils.scala:1260)
>> at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlo
>> ck(TorrentBroadcast.scala:174)
>> at org.apache.spark.broadcast.TorrentBroadcast._value$lzycomput
>> e(TorrentBroadcast.scala:65)
>> at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBr
>> oadcast.scala:65)
>> at org.apache.spark.broadcast.TorrentBroadcast.getValue(Torrent
>> Broadcast.scala:89)
>> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.sca
>> la:67)
>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>> scala:274)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.spark.SparkException: Failed to get
>> broadcast_0_piece0 of broadcast_0
>>
>>
>> Hm any ideas?
>>
>> Thanks
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary 

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Todd Nist
runTask(ResultTask.
> scala:67)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to get
> broadcast_0_piece0 of broadcast_0
>
>
> Hm any ideas?
>
> Thanks
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 8 September 2016 at 12:28, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> This may not be feasible in Spark streaming.
>>
>> I am trying to create a HiveContext in Spark streaming within the
>> streaming context
>>
>> // Create a local StreamingContext with two working thread and batch
>> interval of 2 seconds.
>>
>>  val sparkConf = new SparkConf().
>>  setAppName(sparkAppName).
>>  set("spark.driver.allowMultipleContexts", "true").
>>  set("spark.hadoop.validateOutputSpecs", "false")
>> .
>>
>> Now try to create an sc
>>
>> val sc = new SparkContext(sparkConf)
>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>
>> This is accepted but it creates two spark jobs
>>
>>
>> [image: Inline images 1]
>>
>> And basically it goes to a waiting state
>>
>> Any ideas how one  can create a HiveContext within Spark streaming?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>


Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Ok I managed to sort that one out.

This is what I am facing

 val sparkConf = new SparkConf().
 setAppName(sparkAppName).
 set("spark.driver.allowMultipleContexts", "true").
 set("spark.hadoop.validateOutputSpecs", "false")
 // change the values accordingly.
 sparkConf.set("sparkDefaultParllelism",
sparkDefaultParallelismValue)
 sparkConf.set("sparkSerializer", sparkSerializerValue)
 sparkConf.set("sparkNetworkTimeOut", sparkNetworkTimeOutValue)
 // If you want to see more details of batches please increase
the value
 // and that will be shown UI.
 sparkConf.set("sparkStreamingUiRetainedBatches",
   sparkStreamingUiRetainedBatchesValue)
 sparkConf.set("sparkWorkerUiRetainedDrivers",
   sparkWorkerUiRetainedDriversValue)
 sparkConf.set("sparkWorkerUiRetainedExecutors",
   sparkWorkerUiRetainedExecutorsValue)
 sparkConf.set("sparkWorkerUiRetainedStages",
   sparkWorkerUiRetainedStagesValue)
 sparkConf.set("sparkUiRetainedJobs", sparkUiRetainedJobsValue)
 sparkConf.set("enableHiveSupport",enableHiveSupportValue)

sparkConf.set("spark.streaming.stopGracefullyOnShutdown","true")
 sparkConf.set("spark.streaming.receiver.writeAheadLog.enable",
"true")

sparkConf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite",
"true")

sparkConf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite",
"true")
 var sqltext = ""
 val batchInterval = 2
 val streamingContext = new StreamingContext(sparkConf,
Seconds(batchInterval))

With the above settings,  Spark streaming works fine. *However, after
adding the first line below (in red)*

*val sparkContext  = new SparkContext(sparkConf)*
val HiveContext = new HiveContext(streamingContext.sparkContext)

I get the following errors:

16/09/08 14:02:32 ERROR JobScheduler: Error running job streaming job
1473339752000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage
0.0 (TID 7, 50.140.197.217): java.io.IOException:
*org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
broadcast_0*at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1260)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:67)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get
broadcast_0_piece0 of broadcast_0


Hm any ideas?

Thanks





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 September 2016 at 12:28, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Hi,
>
> This may not be feasible in Spark streaming.
>
> I am trying to create a HiveContext in Spark streaming within the
> streaming context
>
> // Create a local StreamingContext with two working thread and batch
> interval of 2 seconds.
>
>  val sparkConf = new SparkConf().
>  setAppName(sparkAppName).
>  set("spark.driver.allowMultipleContexts", "true").
>  set("spark.hadoop.validateOutputSpecs", "false")
> .....
>
> Now try to create an sc
>
> val sc = new 

Creating HiveContext withing Spark streaming

2016-09-08 Thread Mich Talebzadeh
Hi,

This may not be feasible in Spark streaming.

I am trying to create a HiveContext in Spark streaming within the streaming
context

// Create a local StreamingContext with two working thread and batch
interval of 2 seconds.

 val sparkConf = new SparkConf().
 setAppName(sparkAppName).
 set("spark.driver.allowMultipleContexts", "true").
 set("spark.hadoop.validateOutputSpecs", "false")
.

Now try to create an sc

val sc = new SparkContext(sparkConf)
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

This is accepted but it creates two spark jobs


[image: Inline images 1]

And basically it goes to a waiting state

Any ideas how one  can create a HiveContext within Spark streaming?

Thanks






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


HiveContext in spark

2016-04-12 Thread Selvam Raman
I Could not able to use Insert , update and delete command in HiveContext.

i am using spark 1.6.1 version and hive 1.1.0

Please find the error below.



​scala> hc.sql("delete from  trans_detail where counter=1");
16/04/12 14:58:45 INFO ParseDriver: Parsing command: delete from
 trans_detail where counter=1
16/04/12 14:58:45 INFO ParseDriver: Parse Completed
16/04/12 14:58:45 INFO ParseDriver: Parsing command: delete from
 trans_detail where counter=1
16/04/12 14:58:45 INFO ParseDriver: Parse Completed
16/04/12 14:58:45 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
localhost:60409 in memory (size: 46.9 KB, free: 536.7 MB)
16/04/12 14:58:46 INFO ContextCleaner: Cleaned accumulator 3
16/04/12 14:58:46 INFO BlockManagerInfo: Removed broadcast_4_piece0 on
localhost:60409 in memory (size: 3.6 KB, free: 536.7 MB)
org.apache.spark.sql.AnalysisException:
Unsupported language features in query: delete from  trans_detail where
counter=1
TOK_DELETE_FROM 1, 0,11, 13
  TOK_TABNAME 1, 5,5, 13
trans_detail 1, 5,5, 13
  TOK_WHERE 1, 7,11, 39
= 1, 9,11, 39
  TOK_TABLE_OR_COL 1, 9,9, 32
counter 1, 9,9, 32
  1 1, 11,11, 40

scala.NotImplementedError: No parse rules for TOK_DELETE_FROM:
 TOK_DELETE_FROM 1, 0,11, 13
  TOK_TABNAME 1, 5,5, 13
trans_detail 1, 5,5, 13
  TOK_WHERE 1, 7,11, 39
= 1, 9,11, 39
  TOK_TABLE_OR_COL 1, 9,9, 32
counter 1, 9,9, 32
  1 1, 11,11, 40

org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:1217)
​



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Gavin Yue
This sqlContext is one instance of hive context, do not be confused by the 
name.  



> On Feb 16, 2016, at 12:51, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote:
> 
> Hi All,
> 
> On creating HiveContext in spark-shell, fails with 
> 
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database /SPARK/metastore_db.
> 
> Spark-Shell already has created metastore_db for SqlContext. 
> 
> Spark context available as sc.
> SQL context available as sqlContext.
> 
> But without HiveContext, i am able to query the data using SqlContext . 
> 
> scala>  var df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load("/SPARK/abc")
> df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]
> 
> So is there any real need for HiveContext inside Spark Shell. Is everything 
> that can be done with HiveContext, achievable with SqlContext inside Spark 
> Shell.
> 
> 
> 
> Thanks,
> Prabhu Joseph
> 
> 
> 
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
Thanks Mark, that answers my question.

On Tue, Feb 16, 2016 at 10:55 AM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> Welcome to
>
>     __
>
>  / __/__  ___ _/ /__
>
> _\ \/ _ \/ _ `/ __/  '_/
>
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
>
>   /_/
>
>
>
> Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_72)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
>
> scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext]
>
> res0: Boolean = true
>
>
>
> On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com
> > wrote:
>
>> Hi All,
>>
>> On creating HiveContext in spark-shell, fails with
>>
>> Caused by: ERROR XSDB6: Another instance of Derby may have already booted
>> the database /SPARK/metastore_db.
>>
>> Spark-Shell already has created metastore_db for SqlContext.
>>
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> But without HiveContext, i am able to query the data using SqlContext .
>>
>> scala>  var df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/SPARK/abc")
>> df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]
>>
>> So is there any real need for HiveContext inside Spark Shell. Is
>> everything that can be done with HiveContext, achievable with SqlContext
>> inside Spark Shell.
>>
>>
>>
>> Thanks,
>> Prabhu Joseph
>>
>>
>>
>>
>>
>


Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Mark Hamstra
Welcome to

    __

 / __/__  ___ _/ /__

_\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT

  /_/



Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_72)

Type in expressions to have them evaluated.

Type :help for more information.


scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext]

res0: Boolean = true



On Mon, Feb 15, 2016 at 8:51 PM, Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:

> Hi All,
>
> On creating HiveContext in spark-shell, fails with
>
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted
> the database /SPARK/metastore_db.
>
> Spark-Shell already has created metastore_db for SqlContext.
>
> Spark context available as sc.
> SQL context available as sqlContext.
>
> But without HiveContext, i am able to query the data using SqlContext .
>
> scala>  var df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load("/SPARK/abc")
> df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]
>
> So is there any real need for HiveContext inside Spark Shell. Is
> everything that can be done with HiveContext, achievable with SqlContext
> inside Spark Shell.
>
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>


Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
Hi All,

On creating HiveContext in spark-shell, fails with

Caused by: ERROR XSDB6: Another instance of Derby may have already booted
the database /SPARK/metastore_db.

Spark-Shell already has created metastore_db for SqlContext.

Spark context available as sc.
SQL context available as sqlContext.

But without HiveContext, i am able to query the data using SqlContext .

scala>  var df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").load("/SPARK/abc")
df: org.apache.spark.sql.DataFrame = [Prabhu: string, Joseph: string]

So is there any real need for HiveContext inside Spark Shell. Is everything
that can be done with HiveContext, achievable with SqlContext inside Spark
Shell.



Thanks,
Prabhu Joseph


Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Deenar Toraskar
Hi

I am using a shared sparkContext for all of my Spark jobs. Some of the jobs
use HiveContext, but there isn't a getOrCreate method on HiveContext which
will allow reuse of an existing HiveContext. Such a method exists on
SQLContext only (def getOrCreate(sparkContext: SparkContext): SQLContext).

Is there any reason that a HiveContext cannot be shared amongst multiple
threads within the same Spark driver process?

In addition I cannot seem to be able to cast a HiveContext to a SQLContext,
but this works fine in the spark shell, I am doing something wrong here?

scala> sqlContext

res19: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.hive.HiveContext@383b3357

scala> import org.apache.spark.sql.SQLContext

import org.apache.spark.sql.SQLContext

scala> SQLContext.getOrCreate(sc)

res18: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.hive.HiveContext@383b3357



Regards
Deenar


Re: Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Ted Yu
Have you noticed the following method of HiveContext ?

   * Returns a new HiveContext as new session, which will have separated
SQLConf, UDF/UDAF,
   * temporary tables and SessionState, but sharing the same CacheManager,
IsolatedClientLoader
   * and Hive client (both of execution and metadata) with existing
HiveContext.
   */
  override def newSession(): HiveContext = {

Cheers

On Mon, Jan 25, 2016 at 7:22 AM, Deenar Toraskar 
wrote:

> Hi
>
> I am using a shared sparkContext for all of my Spark jobs. Some of the
> jobs use HiveContext, but there isn't a getOrCreate method on HiveContext
> which will allow reuse of an existing HiveContext. Such a method exists on
> SQLContext only (def getOrCreate(sparkContext: SparkContext): SQLContext).
>
> Is there any reason that a HiveContext cannot be shared amongst multiple
> threads within the same Spark driver process?
>
> In addition I cannot seem to be able to cast a HiveContext to a
> SQLContext, but this works fine in the spark shell, I am doing something
> wrong here?
>
> scala> sqlContext
>
> res19: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.hive.HiveContext@383b3357
>
> scala> import org.apache.spark.sql.SQLContext
>
> import org.apache.spark.sql.SQLContext
>
> scala> SQLContext.getOrCreate(sc)
>
> res18: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.hive.HiveContext@383b3357
>
>
>
> Regards
> Deenar
>


Re: Sharing HiveContext in Spark JobServer / getOrCreate

2016-01-25 Thread Deenar Toraskar
On 25 January 2016 at 21:09, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:

> No I hadn't. This is useful, but in some cases we do want to share the
> same temporary tables between jobs so really wanted a getOrCreate
> equivalent on HIveContext.
>
> Deenar
>
>
>
> On 25 January 2016 at 18:10, Ted Yu  wrote:
>
>> Have you noticed the following method of HiveContext ?
>>
>>* Returns a new HiveContext as new session, which will have separated
>> SQLConf, UDF/UDAF,
>>* temporary tables and SessionState, but sharing the same
>> CacheManager, IsolatedClientLoader
>>* and Hive client (both of execution and metadata) with existing
>> HiveContext.
>>*/
>>   override def newSession(): HiveContext = {
>>
>> Cheers
>>
>> On Mon, Jan 25, 2016 at 7:22 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am using a shared sparkContext for all of my Spark jobs. Some of the
>>> jobs use HiveContext, but there isn't a getOrCreate method on HiveContext
>>> which will allow reuse of an existing HiveContext. Such a method exists on
>>> SQLContext only (def getOrCreate(sparkContext: SparkContext):
>>> SQLContext).
>>>
>>> Is there any reason that a HiveContext cannot be shared amongst multiple
>>> threads within the same Spark driver process?
>>>
>>> In addition I cannot seem to be able to cast a HiveContext to a
>>> SQLContext, but this works fine in the spark shell, I am doing something
>>> wrong here?
>>>
>>> scala> sqlContext
>>>
>>> res19: org.apache.spark.sql.SQLContext =
>>> org.apache.spark.sql.hive.HiveContext@383b3357
>>>
>>> scala> import org.apache.spark.sql.SQLContext
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> scala> SQLContext.getOrCreate(sc)
>>>
>>> res18: org.apache.spark.sql.SQLContext =
>>> org.apache.spark.sql.hive.HiveContext@383b3357
>>>
>>>
>>>
>>> Regards
>>> Deenar
>>>
>>
>>
>


Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Mohammad Islam
I got a similar problem.I'm not sure if your problem is already resolved.
For the record, I solved this type of error by calling 
sc..setMaster(yarn-cluster);  If you find the solution, please let us know.
Regards,Mohammad




 On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com 
wrote:
   

 I am trying to run a Hive query from Spark using HiveContext. Here is the
code

/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)
    
  
    conf.set(spark.executor.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.driver.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.yarn.am.waitTime, 30L)
    
    val sc = new SparkContext(conf)

    val sqlContext = new HiveContext(sc)

    def inputRDD = sqlContext.sql(describe
spark_poc.src_digital_profile_user);

    inputRDD.collect().foreach { println }
    
    println(inputRDD.schema.getClass.getName)
/

Getting this exception. Any clues? The weird part is if I try to do the same
thing but in Java instead of Scala, it runs fine.

/Exception in thread Driver java.lang.NullPointerException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
initialize after waiting for 1 ms. Please check earlier log output for
errors. Failing the application.
Exception in thread main java.lang.NullPointerException
    at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Nitin kak
That is a much better solution than how I resolved it. I got around it by
placing comma separated jar paths for all the hive related jars in --jars
clause.

I will try your solution. Thanks for sharing it.

On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam misla...@yahoo.com wrote:

 I got a similar problem.
 I'm not sure if your problem is already resolved.

 For the record, I solved this type of error by calling sc..setMaster(
 yarn-cluster);

 If you find the solution, please let us know.

 Regards,
 Mohammad





   On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com
 wrote:


 I am trying to run a Hive query from Spark using HiveContext. Here is the
 code

 / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)


 conf.set(spark.executor.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.driver.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.yarn.am.waitTime, 30L)

 val sc = new SparkContext(conf)

 val sqlContext = new HiveContext(sc)

 def inputRDD = sqlContext.sql(describe
 spark_poc.src_digital_profile_user);

 inputRDD.collect().foreach { println }

 println(inputRDD.schema.getClass.getName)
 /

 Getting this exception. Any clues? The weird part is if I try to do the
 same
 thing but in Java instead of Scala, it runs fine.

 /Exception in thread Driver java.lang.NullPointerException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
 initialize after waiting for 1 ms. Please check earlier log output for
 errors. Failing the application.
 Exception in thread main java.lang.NullPointerException
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
 at

 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
 at

 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at

 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a
 signal./



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread nitinkak001
I am trying to run a Hive query from Spark using HiveContext. Here is the
code

/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)

   
conf.set(spark.executor.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
conf.set(spark.driver.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
conf.set(spark.yarn.am.waitTime, 30L)

val sc = new SparkContext(conf)

val sqlContext = new HiveContext(sc)

def inputRDD = sqlContext.sql(describe
spark_poc.src_digital_profile_user);

inputRDD.collect().foreach { println }

println(inputRDD.schema.getClass.getName)
/

Getting this exception. Any clues? The weird part is if I try to do the same
thing but in Java instead of Scala, it runs fine.

/Exception in thread Driver java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
initialize after waiting for 1 ms. Please check earlier log output for
errors. Failing the application.
Exception in thread main java.lang.NullPointerException
at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread Marcelo Vanzin
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote:
 I am trying to run a Hive query from Spark using HiveContext. Here is the
 code

 / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)


 conf.set(spark.executor.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.driver.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.yarn.am.waitTime, 30L)

You're missing /* at the end of your classpath entries. Also, since
you're on CDH 5.2, you'll probably need to filter out the guava jar
from Hive's lib directory, otherwise things might break. So things
will get a little more complicated.

With CDH 5.3 you shouldn't need to filter out the guava jar.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
Help please!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Jimmy McErlain
can you be more specific what version of spark, hive, hadoop, etc...
what are you trying to do?  what are the issues you are seeing?
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Nov 6, 2014 at 9:22 AM, tridib tridib.sama...@live.com wrote:

 Help please!



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
What version of Spark are you using? Did you compile your Spark version
and if so, what compile options did you use?

On 11/6/14, 9:22 AM, tridib tridib.sama...@live.com wrote:

Help please!



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveCont
ext-in-spark-shell-tp18261p18280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Unable to use HiveContext in spark-shell

2014-11-06 Thread Tridib Samanta
 HiveContext in spark-shell
 Date: Thu, 6 Nov 2014 17:38:51 +
  
 What version of Spark are you using? Did you compile your Spark version
 and if so, what compile options did you use?
 
 On 11/6/14, 9:22 AM, tridib tridib.sama...@live.com wrote:
 
 Help please!
 
 
 
 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveCont
 ext-in-spark-shell-tp18261p18280.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 

  

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Terry Siu
Those are the same options I used, except I had —tgz to package it and I built 
off of the master branch. Unfortunately, my only guess is that these errors 
stem from your build environment.  In your spark assembly, do you have any 
classes which belong to the org.apache.hadoop.hive package?


From: Tridib Samanta tridib.sama...@live.commailto:tridib.sama...@live.com
Date: Thursday, November 6, 2014 at 9:49 AM
To: Terry Siu terry@smartfocus.commailto:terry@smartfocus.com, 
u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org 
u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
Subject: RE: Unable to use HiveContext in spark-shell

I am using spark 1.1.0.
I built it using:
./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive 
-DskipTests

My ultimate goal is to execute a query on parquet file with nested structure 
and cast a date string to Date. This is required to calculate the age of Person 
entity.
but I am even unable to pass this line:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
I made sure that org.apache.hadoop package is in the spark assembly jar.

Re-attaching the stack trace for quick reference.

scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

error: bad symbolic reference. A signature in HiveContext.class refers to term 
hive
in package org.apache.hadoop which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling 
HiveContext.class.
error:
 while compiling: console
during phase: erasure
 library version: version 2.10.4
compiler version: version 2.10.4
  reconstructed args:

  last tree to typer: Apply(value $outer)
  symbol: value $outer (flags: method synthetic stable 
expandedname triedcooking)
   symbol definition: val $outer(): $iwC.$iwC.type
 tpe: $iwC.$iwC.type
   symbol owners: value $outer - class $iwC - class $iwC - class $iwC - 
class $read - package $line5
  context owners: class $iwC - class $iwC - class $iwC - class $iwC - 
class $read - package $line5

== Enclosing template or block ==

ClassDef( // class $iwC extends Serializable
  0
  $iwC
  []
  Template( // val local $iwC: notype, tree.tpe=$iwC
java.lang.Object, scala.Serializable // parents
ValDef(
  private
  _
  tpt
  empty
)
// 5 statements
DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC
  method triedcooking
  init
  []
  // 1 parameter list
  ValDef( // $outer: $iwC.$iwC.$iwC.type

$outer
tpt // tree.tpe=$iwC.$iwC.$iwC.type
empty
  )
  tpt // tree.tpe=$iwC
  Block( // tree.tpe=Unit
Apply( // def init(): Object in class Object, tree.tpe=Object
  $iwC.super.init // def init(): Object in class Object, 
tree.tpe=()Object
  Nil
)
()
  )
)
ValDef( // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext
  private local triedcooking
  sqlContext 
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  Apply( // def init(sc: org.apache.spark.SparkContext): 
org.apache.spark.sql.hive.HiveContext in class HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext
new org.apache.spark.sql.hive.HiveContext.init // def init(sc: 
org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class 
HiveContext, tree.tpe=(sc: 
org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext
Apply( // val sc(): org.apache.spark.SparkContext, 
tree.tpe=org.apache.spark.SparkContext
  
$iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc
 // val sc(): org.apache.spark.SparkContext, 
tree.tpe=()org.apache.spark.SparkContext
  Nil
)
  )
)
DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext
  method stable accessor
  sqlContext
  []
  List(Nil)
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  $iwC.this.sqlContext  // private[this] val sqlContext: 
org.apache.spark.sql.hive.HiveContext, 
tree.tpe=org.apache.spark.sql.hive.HiveContext
)
ValDef( // protected val $outer: $iwC.$iwC.$iwC.type
  protected synthetic paramaccessor triedcooking
  $outer 
  tpt // tree.tpe=$iwC.$iwC.$iwC.type
  empty
)
DefDef( // val $outer(): $iwC.$iwC.$iwC.type
  method synthetic stable expandedname triedcooking
  $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer
  []
  List(Nil)
  tpt // tree.tpe=Any
  $iwC.this.$outer  // protected val $outer: $iwC.$iwC.$iwC.type, 
tree.tpe=$iwC.$iwC.$iwC.type
)
  )
)

== Expanded type of tree ==

ThisType(class $iwC)

uncaught exception during compilation: scala.reflect.internal.Types$TypeError
scala.reflect.internal.Types

Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
Yes. I have org.apache.hadoop.hive package in spark assembly.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18322.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread tridib
I built spark-1.1.0 in a new fresh machine. This issue is gone! Thank you all
for your help.

Thanks  Regards
Tridib



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18324.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Unable to use HiveContext in spark-shell

2014-11-05 Thread tridib
I am connecting to a remote master using spark shell. Then I am getting
following error while trying to instantiate HiveContext.

scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

error: bad symbolic reference. A signature in HiveContext.class refers to
term hive
in package org.apache.hadoop which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling
HiveContext.class.
error:
 while compiling: console
during phase: erasure
 library version: version 2.10.4
compiler version: version 2.10.4
  reconstructed args:

  last tree to typer: Apply(value $outer)
  symbol: value $outer (flags: method synthetic stable
expandedname triedcooking)
   symbol definition: val $outer(): $iwC.$iwC.type
 tpe: $iwC.$iwC.type
   symbol owners: value $outer - class $iwC - class $iwC - class $iwC
- class $read - package $line5
  context owners: class $iwC - class $iwC - class $iwC - class $iwC
- class $read - package $line5

== Enclosing template or block ==

ClassDef( // class $iwC extends Serializable
  0
  $iwC
  []
  Template( // val local $iwC: notype, tree.tpe=$iwC
java.lang.Object, scala.Serializable // parents
ValDef(
  private
  _
  tpt
  empty
)
// 5 statements
DefDef( // def init(arg$outer: $iwC.$iwC.$iwC.type): $iwC
  method triedcooking
  init
  []
  // 1 parameter list
  ValDef( // $outer: $iwC.$iwC.$iwC.type

$outer
tpt // tree.tpe=$iwC.$iwC.$iwC.type
empty
  )
  tpt // tree.tpe=$iwC
  Block( // tree.tpe=Unit
Apply( // def init(): Object in class Object, tree.tpe=Object
  $iwC.super.init // def init(): Object in class Object,
tree.tpe=()Object
  Nil
)
()
  )
)
ValDef( // private[this] val sqlContext:
org.apache.spark.sql.hive.HiveContext
  private local triedcooking
  sqlContext 
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  Apply( // def init(sc: org.apache.spark.SparkContext):
org.apache.spark.sql.hive.HiveContext in class HiveContext,
tree.tpe=org.apache.spark.sql.hive.HiveContext
new org.apache.spark.sql.hive.HiveContext.init // def init(sc:
org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in
class HiveContext, tree.tpe=(sc:
org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext
Apply( // val sc(): org.apache.spark.SparkContext,
tree.tpe=org.apache.spark.SparkContext
 
$iwC.this.$line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$iwC$$$outer().$line5$$read$$iwC$$iwC$$$outer().$VAL1().$iw().$iw().sc
// val sc(): org.apache.spark.SparkContext,
tree.tpe=()org.apache.spark.SparkContext
  Nil
)
  )
)
DefDef( // val sqlContext(): org.apache.spark.sql.hive.HiveContext
  method stable accessor
  sqlContext
  []
  List(Nil)
  tpt // tree.tpe=org.apache.spark.sql.hive.HiveContext
  $iwC.this.sqlContext  // private[this] val sqlContext:
org.apache.spark.sql.hive.HiveContext,
tree.tpe=org.apache.spark.sql.hive.HiveContext
)
ValDef( // protected val $outer: $iwC.$iwC.$iwC.type
  protected synthetic paramaccessor triedcooking
  $outer 
  tpt // tree.tpe=$iwC.$iwC.$iwC.type
  empty
)
DefDef( // val $outer(): $iwC.$iwC.$iwC.type
  method synthetic stable expandedname triedcooking
  $line5$$read$$iwC$$iwC$$iwC$$iwC$$$outer
  []
  List(Nil)
  tpt // tree.tpe=Any
  $iwC.this.$outer  // protected val $outer: $iwC.$iwC.$iwC.type,
tree.tpe=$iwC.$iwC.$iwC.type
)
  )
)

== Expanded type of tree ==

ThisType(class $iwC)

uncaught exception during compilation:
scala.reflect.internal.Types$TypeError
scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature
in HiveContext.class refers to term conf
in value org.apache.hadoop.hive which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling
HiveContext.class.
That entry seems to have slain the compiler.  Shall I replay
your session? I can re-run each line except the last one.
[y/n]


Thanks
Tridib



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Zhun Shen
Hi,

When I try to use HiveContext in Spark shell on AWS, I got the error
java.lang.IllegalAccessError: tried to access method
com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap.

I follow the steps below to compile and install Spark(ps. I test 1.0.0,
1.0.1 and 1.0.2).

Step 1:
./make-distribution.sh --hadoop 2.4.0 --with-hive --tgz
Success!

Step 2:
elastic-mapreduce --create --alive --name Spark Test  --ami-version 3.1.0
--instance-type m3.xlarge --instance-count 2
Hadoop version: 2.4.0
Hive: 0.11.0

Success !

3.
wget --no-check-certificate
https://s3.amazonaws.com/spark-related-packages/scala-2.10.3.tgz


4. install and configure Hive, Spark  Scala

# edit hive-site.xml
add account and passport for Amazon RDS to retrive remote metadata of hive.

Successfully connect to RDS!

# edit bashrc
vim /home/hadoop/.bashrc
export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3


# create_spark_env
vim /home/hadoop/spark/conf/spark-env.sh
export SPARK_MASTER_IP=10.218.180.250
export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3
export SPARK_LOCAL_DIRS=/mnt/spark/
export
SPARK_CLASSPATH=/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar
export SPARK_DAEMON_JAVA_OPTS=-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps

# copy core site to spark and shark
cp /home/hadoop/conf/core-site.xml /home/hadoop/spark/conf/



5.Start spark
/home/hadoop/spark/sbin/start-master.sh

spark can read and write data in Amazon S3.

6. ./spark/bin/spark-shell --master spark://10.218.180.250:7077
--driver-class-path spark/lib/mysql-connector-java-5.1.26-bin.jar

7. error log

scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
14/08/07 09:38:39 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive
14/08/07 09:38:39 INFO Configuration.deprecation: mapred.max.split.size is
deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/07 09:38:39 INFO Configuration.deprecation: mapred.min.split.size is
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/07 09:38:39 INFO Configuration.deprecation:
mapred.min.split.size.per.rack is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/07 09:38:39 INFO Configuration.deprecation:
mapred.min.split.size.per.node is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.node
14/08/07 09:38:39 INFO Configuration.deprecation: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
14/08/07 09:38:39 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
hiveContext: org.apache.spark.sql.hive.HiveContext =
org.apache.spark.sql.hive.HiveContext@45be296f

scala import hiveContext._
import hiveContext._

scala hql(show tables)
14/08/07 09:38:48 INFO parse.ParseDriver: Parsing command: show tables
14/08/07 09:38:48 INFO parse.ParseDriver: Parse Completed
14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for
batch MultiInstanceRelations
14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for
batch CaseInsensitiveAttributeReferences
14/08/07 09:38:48 INFO analysis.Analyzer: Max iterations (2) reached for
batch Check Analysis
14/08/07 09:38:48 INFO sql.SQLContext$$anon$1: Max iterations (2) reached
for batch Add exchange
14/08/07 09:38:48 INFO sql.SQLContext$$anon$1: Max iterations (2) reached
for batch Prepare Expressions
14/08/07 09:38:49 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive
14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=Driver.run
14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=TimeToSubmit
14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=compile
14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=parse
14/08/07 09:38:49 INFO parse.ParseDriver: Parsing command: show tables
14/08/07 09:38:49 INFO parse.ParseDriver: Parse Completed
14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=parse
start=1407404329052 end=1407404329052 duration=0
14/08/07 09:38:49 INFO ql.Driver: PERFLOG method=semanticAnalyze
14/08/07 09:38:49 INFO ql.Driver: Semantic Analysis Completed
14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=semanticAnalyze
start=1407404329052 end=1407404329189 duration=137
14/08/07 09:38:49 INFO exec.ListSinkOperator: Initializing Self 0 OP
14/08/07 09:38:49 INFO exec.ListSinkOperator: Operator 0 OP initialized
14/08/07 09:38:49 INFO exec.ListSinkOperator: Initialization Done 0 OP
14/08/07 09:38:49 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from
deserializer)], properties:null)
14/08/07 09:38:49 INFO ql.Driver: /PERFLOG method=compile
start

Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Cheng Lian
Hey Zhun,

Thanks for the detailed problem description. Please see my comments inlined
below.

On Thu, Aug 7, 2014 at 6:18 PM, Zhun Shen shenzhunal...@gmail.com wrote:

Caused by: java.lang.IllegalAccessError: tried to access method
 com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap;
 from class com.jolbox.bonecp.BoneCPDataSource

This line indicates that accessing MapMaker.makeComputingMap via Java
reflection fails. The version of Guava we used in Spark SQL (as a
transitive dependency) is 14.0.1. In this version, MapMaker.makeComputingMap
is still public
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/MapMaker.java?name=v14.0.1#581.
But in newer versions (say 15.0), it’s no longer public
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/MapMaker.java?name=v15.0
.

So my guess is that, a newer version of the Guava library in your classpath
shadows the version Spark SQL uses somehow. A quick and dirty fix to see
whether this is true is try putting Guava 14.0.1 jar file at the beginning
of your classpath and see whether things work.

 at
 com.jolbox.bonecp.BoneCPDataSource.init(BoneCPDataSource.java:64)
 at
 org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:73)
 at
 org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217)
 at
 org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110)
 at
 org.datanucleus.store.rdbms.ConnectionFactoryImpl.init(ConnectionFactoryImpl.java:82)
 ... 119 more

  ​


Re: Got error “java.lang.IllegalAccessError when using HiveContext in Spark shell on AWS

2014-08-07 Thread Zhun Shen
Hi Cheng,

I replaced Guava 15.0 with Guava 14.0.1 in my spark classpath, the problem was 
solved. So your method is correct. It proved that this issue was caused by AWS 
EMR (ami-version 3.1.0) libs which include Guava 15.0.

Many thanks and see you in the first Spark User Beijing Meetup tomorrow.

--
Zhun Shen
Data Mining at LightnInTheBox.com
Email: shenzhunal...@gmail.com | shenz...@yahoo.com
Phone: 186 0627 7769
GitHub: https://github.com/shenzhun
LinkedIn: http://www.linkedin.com/in/shenzhun

On August 7, 2014 at 6:57:06 PM, Cheng Lian (lian.cs@gmail.com) wrote:

Hey Zhun,

Thanks for the detailed problem description. Please see my comments inlined 
below.

On Thu, Aug 7, 2014 at 6:18 PM, Zhun Shen shenzhunal...@gmail.com wrote:

Caused by: java.lang.IllegalAccessError: tried to access method 
com.google.common.collect.MapMaker.makeComputingMap(Lcom/google/common/base/Function;)Ljava/util/concurrent/ConcurrentMap;
 from class com.jolbox.bonecp.BoneCPDataSource
This line indicates that accessing  
MapMaker.makeComputingMap via Java reflection fails. The version of Guava we 
used in Spark SQL (as a transitive dependency) is 14.0.1. In this version,   
MapMaker.makeComputingMap is still  
public. But in newer versions (say 15.0), it’s no longer  
public.

So my guess is that, a newer version of the Guava library in your classpath 
shadows the version Spark SQL uses somehow. A quick and dirty fix to see 
whether this is true is try putting Guava 14.0.1 jar file at the beginning of 
your classpath and see whether things work.

        at com.jolbox.bonecp.BoneCPDataSource.init(BoneCPDataSource.java:64)
        at 
org.datanucleus.store.rdbms.datasource.BoneCPDataSourceFactory.makePooledDataSource(BoneCPDataSourceFactory.java:73)
        at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217)
        at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110)
        at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.init(ConnectionFactoryImpl.java:82)
        ... 119 more

​