Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Jerry Lam Fri, 06 Nov 2015 14:41:32 -0800

Hi Zhan,

I don’t use HiveContext features at all. I use mostly DataFrame API. It is 
sexier and much less typo. :)
Also, HiveContext requires metastore database setup (derby by default). The 
problem is that I cannot have 2 spark-shell sessions running at the same time 
in the same host (e.g. /home/jerry directory). It will give me an exception 
like below.


Since I don’t use HiveContext, I don’t see the need to maintain a database. 

What is interesting is that pyspark shell is able to start more than 1 session 
at the same time. I wonder what pyspark has done better than spark-shell?

Best Regards,

Jerry

> On Nov 6, 2015, at 5:28 PM, Zhan Zhang <zzh...@hortonworks.com> wrote:
> 
> If you assembly jar have hive jar included, the HiveContext will be used. 
> Typically, HiveContext has more functionality than SQLContext. In what case 
> you have to use SQLContext that cannot be done by HiveContext?
> 
> Thanks.
> 
> Zhan Zhang
> 
> On Nov 6, 2015, at 10:43 AM, Jerry Lam <chiling...@gmail.com 
> <mailto:chiling...@gmail.com>> wrote:
> 
>> What is interesting is that pyspark shell works fine with multiple session 
>> in the same host even though multiple HiveContext has been created. What 
>> does pyspark does differently in terms of starting up the shell?
>> 
>>> On Nov 6, 2015, at 12:12 PM, Ted Yu <yuzhih...@gmail.com 
>>> <mailto:yuzhih...@gmail.com>> wrote:
>>> 
>>> In SQLContext.scala :
>>>     // After we have populated SQLConf, we call setConf to populate other 
>>> confs in the subclass
>>>     // (e.g. hiveconf in HiveContext).
>>>     properties.foreach {
>>>       case (key, value) => setConf(key, value)
>>>     }
>>> 
>>> I don't see config of skipping the above call.
>>> 
>>> FYI
>>> 
>>> On Fri, Nov 6, 2015 at 8:53 AM, Jerry Lam <chiling...@gmail.com 
>>> <mailto:chiling...@gmail.com>> wrote:
>>> Hi spark users and developers,
>>> 
>>> Is it possible to disable HiveContext from being instantiated when using 
>>> spark-shell? I got the following errors when I have more than one session 
>>> starts. Since I don't use HiveContext, it would be great if I can have more 
>>> than 1 spark-shell start at the same time. 
>>> 
>>> Exception in thread "main" java.lang.RuntimeException: 
>>> java.lang.RuntimeException: Unable to instantiate 
>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaS
>>> toreClient
>>>         at 
>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>         at 
>>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>> Method)
>>>         at 
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>         at 
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>         at 
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
>>>         at 
>>> org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
>>>         at 
>>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
>>>         at 
>>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
>>>         at 
>>> org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
>>>         at 
>>> org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:235)
>>>         at 
>>> org.apache.spark.sql.SQLContext$$anonfun$5.apply(SQLContext.scala:234)
>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>         at 
>>> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>>         at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>>         at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:234)
>>>         at 
>>> org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:72)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>> Method)
>>>         at 
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>         at 
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>         at 
>>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
>>>         at 
>>> org.apache.spark.repl.SparkILoopExt.importSpark(SparkILoopExt.scala:154)
>>>         at 
>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply$mcZ$sp(SparkILoopExt.scala:127)
>>>         at 
>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
>>>         at 
>>> org.apache.spark.repl.SparkILoopExt$$anonfun$process$1.apply(SparkILoopExt.scala:113)
>>> 
>>> Best Regards,
>>> 
>>> Jerry
>>> 
>> 
>

Re: [Spark-SQL]: Disable HiveContext from instantiating in spark-shell

Reply via email to