[ https://issues.apache.org/jira/browse/SPARK-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232598#comment-14232598 ]
Yu Ishikawa commented on SPARK-3910: ------------------------------------ I had had the same problem like Tomohiko. However, I resolved this, removing all *.pyc under the `python/` directory. {noformat} cd $SPARK_HOME && find python -name "*.pyc" -delete {noformat} If it is true to solve this problem as I said. In my opinion, there are two ways to resolve this issue. 1. remove all `*.pyc` under the `python` directory when running `python/run-tests` at least 2. resolve the cyclic import thanks > ./python/pyspark/mllib/classification.py doctests fails with module name > pollution > ---------------------------------------------------------------------------------- > > Key: SPARK-3910 > URL: https://issues.apache.org/jira/browse/SPARK-3910 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 1.2.0 > Environment: Mac OS X 10.9.5, Python 2.6.8, Java 1.8.0_20, > Jinja2==2.7.3, MarkupSafe==0.23, Pygments==1.6, Sphinx==1.2.3, > argparse==1.2.1, docutils==0.12, flake8==2.2.3, mccabe==0.2.1, numpy==1.9.0, > pep8==1.5.7, psutil==2.1.3, pyflake8==0.1.9, pyflakes==0.8.1, > unittest2==0.5.1, wsgiref==0.1.2 > Reporter: Tomohiko K. > Labels: pyspark, testing > > In ./python/run-tests script, we run the doctests in > ./pyspark/mllib/classification.py. > The output is as following: > {noformat} > $ ./python/run-tests > ... > Running test: pyspark/mllib/classification.py > Traceback (most recent call last): > File "pyspark/mllib/classification.py", line 20, in <module> > import numpy > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/__init__.py", > line 170, in <module> > from . import add_newdocs > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/add_newdocs.py", > line 13, in <module> > from numpy.lib import add_newdoc > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/__init__.py", > line 8, in <module> > from .type_check import * > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/lib/type_check.py", > line 11, in <module> > import numpy.core.numeric as _nx > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/core/__init__.py", > line 46, in <module> > from numpy.testing import Tester > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/__init__.py", > line 13, in <module> > from .utils import * > File > "/Users/tomohiko/.virtualenvs/pyspark_py26/lib/python2.6/site-packages/numpy/testing/utils.py", > line 15, in <module> > from tempfile import mkdtemp > File > "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/tempfile.py", > line 34, in <module> > from random import Random as _Random > File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/mllib/random.py", > line 24, in <module> > from pyspark.rdd import RDD > File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/__init__.py", line > 51, in <module> > from pyspark.context import SparkContext > File "/Users/tomohiko/MyRepos/Scala/spark/python/pyspark/context.py", line > 22, in <module> > from tempfile import NamedTemporaryFile > ImportError: cannot import name NamedTemporaryFile > 0.07 real 0.04 user 0.02 sys > Had test failures; see logs. > {noformat} > The problem is a cyclic import of tempfile module. > The cause of it is that pyspark.mllib.random module exists in the directory > where pyspark.mllib.classification module exists. > classification module imports numpy module, and then numpy module imports > tempfile module from its inside. > Now the first entry sys.path is the directory "./python/pyspark/mllib" (where > the executed file "classification.py" exists), so tempfile module imports > pyspark.mllib.random module (not the standard library "random" module). > Finally, import chains reach tempfile again, then a cyclic import is formed. > Summary: classification → numpy → tempfile → pyspark.mllib.random → tempfile > → (cyclic import!!) > Furthermore, stat module is in a standard library, and pyspark.mllib.stat > module exists. This also may be troublesome. > commit: 0e8203f4fb721158fb27897680da476174d24c4b > A fundamental solution is to avoid using module names used by standard > libraries (currently "random" and "stat"). > A difficulty of this solution is to rename pyspark.mllib.random and > pyspark.mllib.stat, which may be already used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org