[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127330#comment-16127330 ]
Mathias M. Andersen edited comment on SPARK-19019 at 8/15/17 2:51 PM: ---------------------------------------------------------------------- Just got this error post fix on spark 2.1: {code:java} Traceback (most recent call last): File "/opt/anaconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/opt/anaconda3/lib/python3.6/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/usr/hdp/current/spark-client/python/pyspark/__init__.py", line 41, in <module> from pyspark.context import SparkContext File "/usr/hdp/current/spark-client/python/pyspark/context.py", line 33, in <module> from pyspark.java_gateway import launch_gateway File "/usr/hdp/current/spark-client/python/pyspark/java_gateway.py", line 25, in <module> import platform File "/opt/anaconda3/lib/python3.6/platform.py", line 886, in <module> "system node release version machine processor") File "/usr/hdp/current/spark-client/python/pyspark/serializers.py", line 381, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' {code} was (Author: mrmathias): Just got this error post fix on spark 2.1: Traceback (most recent call last): File "/opt/anaconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/opt/anaconda3/lib/python3.6/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/usr/hdp/current/spark-client/python/pyspark/__init__.py", line 41, in <module> from pyspark.context import SparkContext File "/usr/hdp/current/spark-client/python/pyspark/context.py", line 33, in <module> from pyspark.java_gateway import launch_gateway File "/usr/hdp/current/spark-client/python/pyspark/java_gateway.py", line 25, in <module> import platform File "/opt/anaconda3/lib/python3.6/platform.py", line 886, in <module> "system node release version machine processor") File "/usr/hdp/current/spark-client/python/pyspark/serializers.py", line 381, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' > PySpark does not work with Python 3.6.0 > --------------------------------------- > > Key: SPARK-19019 > URL: https://issues.apache.org/jira/browse/SPARK-19019 > Project: Spark > Issue Type: Bug > Components: PySpark > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Critical > Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0 > > > Currently, PySpark does not work with Python 3.6.0. > Running {{./bin/pyspark}} simply throws the error as below: > {code} > Traceback (most recent call last): > File ".../spark/python/pyspark/shell.py", line 30, in <module> > import pyspark > File ".../spark/python/pyspark/__init__.py", line 46, in <module> > from pyspark.context import SparkContext > File ".../spark/python/pyspark/context.py", line 36, in <module> > from pyspark.java_gateway import launch_gateway > File ".../spark/python/pyspark/java_gateway.py", line 31, in <module> > from py4j.java_gateway import java_import, JavaGateway, GatewayClient > File "<frozen importlib._bootstrap>", line 961, in _find_and_load > File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked > File "<frozen importlib._bootstrap>", line 646, in _load_unlocked > File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible > File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 18, in <module> > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", > line 62, in <module> > import pkgutil > File > "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", > line 22, in <module> > ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') > File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple > cls = _old_namedtuple(*args, **kwargs) > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > The problem is in > https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 > as the error says and the cause seems because the arguments of > {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 > (See https://bugs.python.org/issue25628). > We currently copy this function via {{types.FunctionType}} which does not set > the default values of keyword-only arguments (meaning > {{namedtuple.__kwdefaults__}}) and this seems causing internally missing > values in the function (non-bound arguments). > This ends up as below: > {code} > import types > import collections > def _copy_func(f): > return types.FunctionType(f.__code__, f.__globals__, f.__name__, > f.__defaults__, f.__closure__) > _old_namedtuple = _copy_func(collections.namedtuple) > _old_namedtuple(, "b") > _old_namedtuple("a") > {code} > If we call as below: > {code} > >>> _old_namedtuple("a", "b") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', > 'rename', and 'module' > {code} > It throws an exception as above becuase {{__kwdefaults__}} for required > keyword arguments seem unset in the copied function. So, if we give explicit > value for these, > {code} > >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None) > <class '__main__.a'> > {code} > It works fine. > It seems now we should properly set these into the hijected one. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org