Re: Missing HiveConf when starting PySpark from head

Li Jin Thu, 14 Jun 2018 11:04:13 -0700

Sounds good. Thanks all for the quick reply.

https://issues.apache.org/jira/browse/SPARK-24563



On Thu, Jun 14, 2018 at 12:19 PM, Xiao Li <gatorsm...@gmail.com> wrote:

> Thanks for catching this. Please feel free to submit a PR. I do not think
> Vanzin wants to introduce the behavior changes in that PR. We should do the
> code review more carefully.
>
> Xiao
>
> 2018-06-14 9:18 GMT-07:00 Li Jin <ice.xell...@gmail.com>:
>
>> Are there objection to restore the behavior for PySpark users? I am happy
>> to submit a patch.
>>
>> On Thu, Jun 14, 2018 at 12:15 PM Reynold Xin <r...@databricks.com> wrote:
>>
>>> The behavior change is not good...
>>>
>>> On Thu, Jun 14, 2018 at 9:05 AM Li Jin <ice.xell...@gmail.com> wrote:
>>>
>>>> Ah, looks like it's this change:
>>>> https://github.com/apache/spark/commit/b3417b731d4e323398a0d
>>>> 7ec6e86405f4464f4f9#diff-3b5463566251d5b09fd328738a9e9bc5
>>>>
>>>> It seems strange that by default Spark doesn't build with Hive but by
>>>> default PySpark requires it...
>>>>
>>>> This might also be a behavior change to PySpark users that build Spark
>>>> without Hive. The old behavior is "fall back to non-hive support" and the
>>>> new behavior is "program won't start".
>>>>
>>>> On Thu, Jun 14, 2018 at 11:51 AM, Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> I think you would have to build with the 'hive' profile? but if so
>>>>> that would have been true for a while now.
>>>>>
>>>>>
>>>>> On Thu, Jun 14, 2018 at 10:38 AM Li Jin <ice.xell...@gmail.com> wrote:
>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> I just did a clean checkout of github.com/apache/spark but failed to
>>>>>> start PySpark, this is what I did:
>>>>>>
>>>>>> git clone g...@github.com:apache/spark.git; cd spark; build/sbt
>>>>>> package; bin/pyspark
>>>>>>
>>>>>> And got this exception:
>>>>>>
>>>>>> (spark-dev) Lis-MacBook-Pro:spark icexelloss$ bin/pyspark
>>>>>>
>>>>>> Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31)
>>>>>>
>>>>>> [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
>>>>>>
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>
>>>>>> 18/06/14 11:34:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>>> library for your platform... using builtin-java classes where applicable
>>>>>>
>>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>>>>> s.properties
>>>>>>
>>>>>> Setting default log level to "WARN".
>>>>>>
>>>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>>>> setLogLevel(newLevel).
>>>>>>
>>>>>> /Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py:45:
>>>>>> UserWarning: Failed to initialize Spark session.
>>>>>>
>>>>>>   warnings.warn("Failed to initialize Spark session.")
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>
>>>>>>   File 
>>>>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/shell.py",
>>>>>> line 41, in <module>
>>>>>>
>>>>>>     spark = SparkSession._create_shell_session()
>>>>>>
>>>>>>   File 
>>>>>> "/Users/icexelloss/workspace/upstream2/spark/python/pyspark/sql/session.py",
>>>>>> line 564, in _create_shell_session
>>>>>>
>>>>>>     SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
>>>>>>
>>>>>> TypeError: 'JavaPackage' object is not callable
>>>>>>
>>>>>> I also tried to delete hadoop deps from my ivy2 cache and reinstall
>>>>>> them but no luck. I wonder:
>>>>>>
>>>>>>
>>>>>>    1. I have not seen this before, could this be caused by recent
>>>>>>    change to head?
>>>>>>    2. Am I doing something wrong in the build process?
>>>>>>
>>>>>>
>>>>>> Thanks much!
>>>>>> Li
>>>>>>
>>>>>>
>>>>
>

Re: Missing HiveConf when starting PySpark from head

Reply via email to