Re: [External] Re: Spark 1.6.0 HiveContext NPE

Ted Yu Thu, 04 Feb 2016 07:10:39 -0800

Jay:
It would be nice if you can patch Spark with below PR and give it a try.


Thanks

On Wed, Feb 3, 2016 at 6:03 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Created a pull request:
> https://github.com/apache/spark/pull/11066
>
> FYI
>
> On Wed, Feb 3, 2016 at 1:27 PM, Shipper, Jay [USA] <shipper_...@bah.com>
> wrote:
>
>> It was just renamed recently: https://github.com/apache/spark/pull/10981
>>
>> As SessionState is entirely managed by Spark’s code, it still seems like
>> this is a bug with Spark 1.6.0, and not with how our application is using
>> HiveContext.  But I’d feel more confident filing a bug if someone else
>> could confirm they’re having this issue with Spark 1.6.0.  Ideally, we
>> should also have some simple proof of concept that can be posted with the
>> bug.
>>
>> From: Ted Yu <yuzhih...@gmail.com>
>> Date: Wednesday, February 3, 2016 at 3:57 PM
>> To: Jay Shipper <shipper_...@bah.com>
>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE
>>
>> In ClientWrapper.scala, the SessionState.get().getConf call might have
>> been executed ahead of SessionState.start(state) at line 194.
>>
>> This was the JIRA:
>>
>> [SPARK-10810] [SPARK-10902] [SQL] Improve session management in SQL
>>
>> In master branch, there is no more ClientWrapper.scala
>>
>> FYI
>>
>> On Wed, Feb 3, 2016 at 11:15 AM, Shipper, Jay [USA] <shipper_...@bah.com>
>> wrote:
>>
>>> One quick update on this: The NPE is not happening with Spark 1.5.2, so
>>> this problem seems specific to Spark 1.6.0.
>>>
>>> From: Jay Shipper <shipper_...@bah.com>
>>> Date: Wednesday, February 3, 2016 at 12:06 PM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Re: [External] Re: Spark 1.6.0 HiveContext NPE
>>>
>>> Right, I could already tell that from the stack trace and looking at
>>> Spark’s code.  What I’m trying to determine is why that’s coming back as
>>> null now, just from upgrading Spark to 1.6.0.
>>>
>>> From: Ted Yu <yuzhih...@gmail.com>
>>> Date: Wednesday, February 3, 2016 at 12:04 PM
>>> To: Jay Shipper <shipper_...@bah.com>
>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: [External] Re: Spark 1.6.0 HiveContext NPE
>>>
>>> Looks like the NPE came from this line:
>>>   def conf: HiveConf = SessionState.get().getConf
>>>
>>> Meaning SessionState.get() returned null.
>>>
>>> On Wed, Feb 3, 2016 at 8:33 AM, Shipper, Jay [USA] <shipper_...@bah.com>
>>> wrote:
>>>
>>>> I’m upgrading an application from Spark 1.4.1 to Spark 1.6.0, and I’m
>>>> getting a NullPointerException from HiveContext.  It’s happening while it
>>>> tries to load some tables via JDBC from an external database (not Hive),
>>>> using context.read().jdbc():
>>>>
>>>> —
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.spark.sql.hive.client.ClientWrapper.conf(ClientWrapper.scala:205)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:552)
>>>> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:551)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:538)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext$$anonfun$configure$1.apply(HiveContext.scala:537)
>>>> at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> at
>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.configure(HiveContext.scala:537)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:250)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:457)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:457)
>>>> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:456)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext$$anon$3.<init>(HiveContext.scala:473)
>>>> at
>>>> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:473)
>>>> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:472)
>>>> at
>>>> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>>>> at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
>>>> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>>>> at
>>>> org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
>>>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:223)
>>>> at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
>>>> —
>>>>
>>>> Even though the application is not using Hive, HiveContext is used
>>>> instead of SQLContext, for the additional functionality it provides.
>>>> There’s no hive-site.xml for the application, but this did not cause an
>>>> issue for Spark 1.4.1.
>>>>
>>>> Does anyone have an idea about what’s changed from 1.4.1 to 1.6.0 that
>>>> could explain this NPE?  The only obvious change I’ve noticed for
>>>> HiveContext is that the default warehouse location is different (1.4.1 -
>>>> current directory, 1.6.0 - /user/hive/warehouse), but I verified that this
>>>> NPE happens even when /user/hive/warehouse exists and is readable/writeable
>>>> for the application.  In terms of changes to the application to work with
>>>> Spark 1.6.0, the only one that might be relevant to this issue is the
>>>> upgrade in the Hadoop dependencies to match what Spark 1.6.0 uses
>>>> (2.6.0-cdh5.7.0-SNAPSHOT).
>>>>
>>>> Thanks,
>>>> Jay
>>>>
>>>
>>>
>>
>

Re: [External] Re: Spark 1.6.0 HiveContext NPE

Reply via email to