[ https://issues.apache.org/jira/browse/SPARK-34100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264510#comment-17264510 ]
Hyukjin Kwon commented on SPARK-34100: -------------------------------------- If this is fixed upstream, it's best to identify the ticket and port back instead of filing new JIRA to request backport. I am resolving this ticket for now but it would be great if we can identify the ticket that fixed this issue. > pyspark 2.4 packages can't be installed via pip on Amazon Linux 2 > ----------------------------------------------------------------- > > Key: SPARK-34100 > URL: https://issues.apache.org/jira/browse/SPARK-34100 > Project: Spark > Issue Type: Bug > Components: Deploy, PySpark > Affects Versions: 2.4.7 > Environment: Amazon Linux 2, with Python 3.7.9 and pip 9.0.3 (also > tested with pip 20.3.3), using Docker or EMR 5.32.0 > > Example Dockerfile to reproduce: > {{FROM amazonlinux:2}} > {{RUN yum install -y python3}} > {{RUN pip3 install pyspark==2.4.7}} > > Reporter: Devin Boyer > Priority: Minor > > I'm unable to install the pyspark Python package on Amazon Linux 2, whether > in a Docker image or an EMR cluster. Amazon Linux 2 currently ships with > Python 3.7 and pip 9.0.3, but upgrading pip yields the same result. > > When installing the package, the installation will fail with the error > "ValueError: bad marshal data (unknown type code)". Full example stack below. > > This bug prevents use of pyspark for simple testing environments, and from > using tools where the pyspark package is a dependency, like > [https://github.com/awslabs/python-deequ.] > > Stack Trace: > {{Step 3/3 : RUN pip3 install pyspark==2.4.7}} > {{ ---> Running in 2c6e1c1de62f}} > {{WARNING: Running pip install with root privileges is generally not a good > idea. Try `pip3 install --user` instead.}} > {{Collecting pyspark==2.4.7}} > {{ Downloading > https://files.pythonhosted.org/packages/e2/06/29f80e5a464033432eedf89924e7aa6ebbc47ce4dcd956853a73627f2c07/pyspark-2.4.7.tar.gz > (217.9MB)}} > {{ Complete output from command python setup.py egg_info:}} > {{ Could not import pypandoc - required to package PySpark}} > {{ /usr/lib64/python3.7/distutils/dist.py:274: UserWarning: Unknown > distribution option: 'long_description_content_type'}} > {{ warnings.warn(msg)}} > {{ zip_safe flag not set; analyzing archive contents...}} > {{ Traceback (most recent call last):}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > <module>}} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}} > {{ return analyze_egg(self.bdist_dir, self.stubs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 379, in analyze_egg}} > {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 416, in scan_module}} > {{ code = marshal.load(f)}} > {{ ValueError: bad marshal data (unknown type code)}}{{During handling of the > above exception, another exception occurred:}}{{Traceback (most recent call > last):}} > {{ File "<string>", line 1, in <module>}} > {{ File "/tmp/pip-build-j3d56a0n/pyspark/setup.py", line 224, in <module>}} > {{ 'Programming Language :: Python :: Implementation :: PyPy']}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 128, > in setup}} > {{ _install_setup_requires(attrs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 123, > in _install_setup_requires}} > {{ dist.fetch_build_eggs(dist.setup_requires)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 461, in > fetch_build_eggs}} > {{ replace_conflicting=True,}} > {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line > 866, in resolve}} > {{ replace_conflicting=replace_conflicting}} > {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line > 1146, in best_match}} > {{ return self.obtain(req, installer)}} > {{ File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line > 1158, in obtain}} > {{ return installer(requirement)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/dist.py", line 528, in > fetch_build_egg}} > {{ return cmd.easy_install(req)}} > {{ File > "/usr/lib/python3.7/site-packages/setuptools/command/easy_install.py", line > 672, in easy_install}} > {{ return self.install_item(spec, dist.location, tmpdir, deps)}} > {{ File > "/usr/lib/python3.7/site-packages/setuptools/command/easy_install.py", line > 698, in install_item}} > {{ dists = self.install_eggs(spec, download, tmpdir)}} > {{ File > "/usr/lib/python3.7/site-packages/setuptools/command/easy_install.py", line > 881, in install_eggs}} > {{ return self.build_and_install(setup_script, setup_base)}} > {{ File > "/usr/lib/python3.7/site-packages/setuptools/command/easy_install.py", line > 1149, in build_and_install}} > {{ self.run_setup(setup_script, setup_base, args)}} > {{ File > "/usr/lib/python3.7/site-packages/setuptools/command/easy_install.py", line > 1135, in run_setup}} > {{ run_setup(setup_script, args)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 253, > in run_setup}} > {{ raise}} > {{ File "/usr/lib64/python3.7/contextlib.py", line 130, in __exit__}} > {{ self.gen.throw(type, value, traceback)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib64/python3.7/contextlib.py", line 130, in __exit__}} > {{ self.gen.throw(type, value, traceback)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 166, > in save_modules}} > {{ saved_exc.resume()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 141, > in resume}} > {{ six.reraise(type, exc, self._tb)}} > {{ File "/usr/lib/python3.7/site-packages/pkg_resources/_vendor/six.py", line > 685, in reraise}} > {{ raise value.with_traceback(tb)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 154, > in save_modules}} > {{ yield saved}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 195, > in setup_context}} > {{ yield}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 250, > in run_setup}} > {{ _execfile(setup_script, ns)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/sandbox.py", line 45, in > _execfile}} > {{ exec(code, globals, locals)}} > {{ File "/tmp/easy_install-l742j64w/pypandoc-1.5/setup.py", line 111, in > <module>}} > {{ # using Python imports instead which will be resolved correctly.}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 129, > in setup}} > {{ return distutils.core.setup(**attrs)}} > {{ File "/usr/lib64/python3.7/distutils/core.py", line 148, in setup}} > {{ dist.run_commands()}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 966, in run_commands}} > {{ self.run_command(cmd)}} > {{ File "/usr/lib64/python3.7/distutils/dist.py", line 985, in run_command}} > {{ cmd_obj.run()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 218, in run}} > {{ os.path.join(archive_root, 'EGG-INFO'), self.zip_safe()}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 269, in zip_safe}} > {{ return analyze_egg(self.bdist_dir, self.stubs)}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 379, in analyze_egg}} > {{ safe = scan_module(egg_dir, base, name, stubs) and safe}} > {{ File "/usr/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", > line 416, in scan_module}} > {{ code = marshal.load(f)}} > {{ ValueError: bad marshal data (unknown type code)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org