Re: Missing HiveConf when starting PySpark from head

Li Jin Thu, 14 Jun 2018 09:05:42 -0700

Ah, looks like it's this change:
https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5


It seems strange that by default Spark doesn't build with Hive but by
default PySpark requires it...

This might also be a behavior change to PySpark users that build Spark
without Hive. The old behavior is "fall back to non-hive support" and the
new behavior is "program won't start".

On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen <sro...@gmail.com> wrote:

> I think you would have to build with the 'hive' profile? but if so that
> would have been true for a while now.
>
>
> On Thu, Jun 14, 2018 at 10:38 AM Li Jin <ice.xell...@gmail.com> wrote:
>
>> Hey all,
>>
>> I just did a clean checkout of github.com/apache/spark but failed to
>> start PySpark, this is what I did:
>>
>> git clone g...@github.com:apache/spark.git; cd spark; build/sbt package;
>> bin/pyspark
>>
>> And got this exception:
>>
>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark
>>
>> Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31)
>>
>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
>>
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>> Using Spark's default log4j profile: org/apache/spark/log4j-
>> defaults.properties
>>
>> Setting default log level to "WARN".
>>
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>>
>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45:
>> UserWarning: Failed to initialize Spark session.
>>
>>   warnings.warn("Failed to initialize Spark session.")
>>
>> Traceback (most recent call last):
>>
>>   File "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py",
>> line 41, in <module>
>>
>>     spark = SparkSession._create_shell_session()
>>
>>   File 
>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py",
>> line 564, in _create_shell_session
>>
>>     SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
>>
>> TypeError: 'JavaPackage' object is not callable
>>
>> I also tried to delete hadoop deps from my ivy2 cache and reinstall them
>> but no luck. I wonder:
>>
>>
>>    1. I have not seen this before, could this be caused by recent change
>>    to head?
>>    2. Am I doing something wrong in the build process?
>>
>>
>> Thanks much!
>> Li
>>
>>

Re: Missing HiveConf when starting PySpark from head

Reply via email to