Re: Missing HiveConf when starting PySpark from head

Li Jin Thu, 14 Jun 2018 09:18:59 -0700

Are there objection to restore the behavior for PySpark users? I am happy
to submit a patch.
On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin <[email protected]> wrote:


> The behavior change is not good...
>
> On Thu, Jun 14, 2018 at 9:05 AM Li Jin <[email protected]> wrote:
>
>> Ah, looks like it's this change:
>>
>> https://github.com/apache/spark/commit/b3417b731d4e323398a0d7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5
>>
>> It seems strange that by default Spark doesn't build with Hive but by
>> default PySpark requires it...
>>
>> This might also be a behavior change to PySpark users that build Spark
>> without Hive. The old behavior is "fall back to non-hive support" and the
>> new behavior is "program won't start".
>>
>> On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen <[email protected]> wrote:
>>
>>> I think you would have to build with the 'hive' profile? but if so that
>>> would have been true for a while now.
>>>
>>>
>>> On Thu, Jun 14, 2018 at 10:38 AM Li Jin <[email protected]> wrote:
>>>
>>>> Hey all,
>>>>
>>>> I just did a clean checkout of github.com/apache/spark but failed to
>>>> start PySpark, this is what I did:
>>>>
>>>> git clone [email protected]:apache/spark.git; cd spark; build/sbt
>>>> package; bin/pyspark
>>>>
>>>> And got this exception:
>>>>
>>>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark
>>>>
>>>> Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31)
>>>>
>>>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
>>>>
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>
>>>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>>
>>>> Using Spark's default log4j profile:
>>>> org/apache/spark/log4j-defaults.properties
>>>>
>>>> Setting default log level to "WARN".
>>>>
>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>> setLogLevel(newLevel).
>>>>
>>>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45:
>>>> UserWarning: Failed to initialize Spark session.
>>>>
>>>>   warnings.warn("Failed to initialize Spark session.")
>>>>
>>>> Traceback (most recent call last):
>>>>
>>>>   File
>>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py", line
>>>> 41, in <module>
>>>>
>>>>     spark = SparkSession._create_shell_session()
>>>>
>>>>   File
>>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py",
>>>> line 564, in _create_shell_session
>>>>
>>>>     SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
>>>>
>>>> TypeError: 'JavaPackage' object is not callable
>>>>
>>>> I also tried to delete hadoop deps from my ivy2 cache and reinstall
>>>> them but no luck. I wonder:
>>>>
>>>>
>>>>    1. I have not seen this before, could this be caused by recent
>>>>    change to head?
>>>>    2. Am I doing something wrong in the build process?
>>>>
>>>>
>>>> Thanks much!
>>>> Li
>>>>
>>>>
>>

Re: Missing HiveConf when starting PySpark from head

Reply via email to