[ https://issues.apache.org/jira/browse/SPARK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065047#comment-14065047 ]
Matthew Farrellee commented on SPARK-1670: ------------------------------------------ SPARK-2313 is the root cause of this. a workaround for this would be complex because the extra text on stdout is coming from the same jvm that should produce the py4j port. > PySpark Fails to Create SparkContext Due To Debugging Options in > conf/java-opts > ------------------------------------------------------------------------------- > > Key: SPARK-1670 > URL: https://issues.apache.org/jira/browse/SPARK-1670 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.0.0 > Environment: pats-air:spark pat$ IPYTHON=1 bin/pyspark > Python 2.7.5 (default, Aug 25 2013, 00:04:04) > ... > IPython 1.1.0 > ... > Spark version 1.0.0-SNAPSHOT > Using Python version 2.7.5 (default, Aug 25 2013 00:04:04) > Reporter: Pat McDonough > > When JVM debugging options are in conf/java-opts, it causes pyspark to fail > when creating the SparkContext. The java-opts file looks like the following: > {code}-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 > {code} > Here's the error: > {code}--------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /Library/Python/2.7/site-packages/IPython/utils/py3compat.pyc in > execfile(fname, *where) > 202 else: > 203 filename = fname > --> 204 __builtin__.execfile(filename, *where) > /Users/pat/Projects/spark/python/pyspark/shell.py in <module>() > 41 SparkContext.setSystemProperty("spark.executor.uri", > os.environ["SPARK_EXECUTOR_URI"]) > 42 > ---> 43 sc = SparkContext(os.environ.get("MASTER", "local[*]"), > "PySparkShell", pyFiles=add_files) > 44 > 45 print("""Welcome to > /Users/pat/Projects/spark/python/pyspark/context.pyc in __init__(self, > master, appName, sparkHome, pyFiles, environment, batchSize, serializer, > conf, gateway) > 92 tempNamedTuple = namedtuple("Callsite", "function file > linenum") > 93 self._callsite = tempNamedTuple(function=None, file=None, > linenum=None) > ---> 94 SparkContext._ensure_initialized(self, gateway=gateway) > 95 > 96 self.environment = environment or {} > /Users/pat/Projects/spark/python/pyspark/context.pyc in > _ensure_initialized(cls, instance, gateway) > 172 with SparkContext._lock: > 173 if not SparkContext._gateway: > --> 174 SparkContext._gateway = gateway or launch_gateway() > 175 SparkContext._jvm = SparkContext._gateway.jvm > 176 SparkContext._writeToFile = > SparkContext._jvm.PythonRDD.writeToFile > /Users/pat/Projects/spark/python/pyspark/java_gateway.pyc in launch_gateway() > 44 proc = Popen(command, stdout=PIPE, stdin=PIPE) > 45 # Determine which ephemeral port the server started on: > ---> 46 port = int(proc.stdout.readline()) > 47 # Create a thread to echo output from the GatewayServer, which is > required > 48 # for Java log output to show up: > ValueError: invalid literal for int() with base 10: 'Listening for transport > dt_socket at address: 5005\n' > {code} > Note that when you use JVM debugging, the very first line of output (e.g. > when running spark-shell) looks like this: > {code}Listening for transport dt_socket at address: 5005{code} -- This message was sent by Atlassian JIRA (v6.2#6252)