I have python jupyter notebook setup to create a spark context by default, and sometimes these fail with the following error:
 
18/04/30 18:03:27 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
18/04/30 18:03:27 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
I have tracked it down to two possible settings that may cause this in spark 2.0.2, client mode, standalone cluster setup, running in kubernetes:
 
spark.driver.port - we don't set it, so it should be random
spark.ui.port - we set spark.ui.enabled=false so it should not try to bind to this port.
 
Short story is I do not know which one spark gets confused about, and looking at spark code not clear how spark.ui.port would cause this even if the error message lists it as a possible cause.
 
Question 1: have you seen this before?
Question 2: how do I trace the spark driver process? It seems that I can only set the sc.logLevel after the spark context is created, but I need to trace before the spark context is created.
 
I created a log4j.properties file in the spark/conf directory and set it to TRACE but that only gets picked up when I run a Scala jupyter notebook, not when I run a python juypyter notebook, and I haven't been able to find out how to turn the same level of tracing for a spark-driver process started via a python jupyter notebook.
 
Some things I looked at:
 
`SPARK_PRINT_LAUNCH_COMMAND=1 /usr/local/spark-2.0.2-bin-hadoop2.7/bin/pyspark`

Spark Command: python2.7
========================================
Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Spark Command: **/usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/local/spark/conf/**:/usr/local/spark/jars/* -Xmx1g  org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell
 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    0  1308  1308  1308 ?         1416 Ss       0   0:00 bash
 1308  1416  1416  1308 ?         1416 R+       0   0:00  \_ ps axjf
    0  1151  1151  1151 ?         1151 Ss+      0   0:00 bash
    0     1     1     1 ?           -1 Ss       0   0:00 /bin/bash /usr/local/bin/start-dsx-notebook.sh
    1  1014     1     1 ?           -1 S        0   0:00 /bin/sh /user-home/.scripts/publishing-startup-scripts/nbexec_py_startup.sh
 1014  1026     1     1 ?           -1 S        0   0:06  \_ python /user-home/.scripts/system/publishing-api/py2http.py
    1  1017     1     1 ?           -1 S        0   0:00 su -l 1001 /usr/local/bin/start-user-notebook.sh spark-master-svc:7077 dsx /user-home/1001/DSX_Projects/imagemgmt 1523893891668 imagemgmt Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3QiLCJzdWIiOiJ0ZXN0IiwiaXNzIjoiS05PWFNTTyIsImF1ZCI6IkRTWCIsInJvbGUiOiJBZG1pbiIsInVpZCI6IjEwMDEiLCJpYXQiOjE1MjQ2MDIxMTd9.jHyjakD4G7XlOJ3Q1e5We3agHy_dtao_U98rZcLuTNBgGaETYKfHO2PC-94HG_nxIcTjDxymefWHItiwO7QcTIg_sIkP4uPSfQMTFthrMWNUucR0xRWJxFPcYgLlKo3T2P8JmA_LslVWqFD_MMjmYHI3UukVRj319_MSsRTW3Md3quF5mmv3OZMVjuI8faKMQF7zt_17W_QbNZAT91F0AboXJ7iazz71vcsuZZx0OxnSzJzcW3AEYb8JFWz3opbRwpc3dswbLco8TJ6I4DtacBq7syv3zg0bLIIcHSCp-LBwHrTyCWV7uJ0a3m-MSdvwdZ35WYE6_8LRwadKfW6hiw 1001
 1017  1018  1018  1018 ?           -1 Ss    1001   0:00  \_ -su /usr/local/bin/start-user-notebook.sh spark-master-svc:7077 dsx /user-home/1001/DSX_Projects/imagemgmt 1523893891668 imagemgmt Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3QiLCJzdWIiOiJ0ZXN0IiwiaXNzIjoiS05PWFNTTyIsImF1ZCI6IkRTWCIsInJvbGUiOiJBZG1pbiIsInVpZCI6IjEwMDEiLCJpYXQiOjE1MjQ2MDIxMTd9.jHyjakD4G7XlOJ3Q1e5We3agHy_dtao_U98rZcLuTNBgGaETYKfHO2PC-94HG_nxIcTjDxymefWHItiwO7QcTIg_sIkP4uPSfQMTFthrMWNUucR0xRWJxFPcYgLlKo3T2P8JmA_LslVWqFD_MMjmYHI3UukVRj319_MSsRTW3Md3quF5mmv3OZMVjuI8faKMQF7zt_17W_QbNZAT91F0AboXJ7iazz71vcsuZZx0OxnSzJzcW3AEYb8JFWz3opbRwpc3dswbLco8TJ6I4DtacBq7syv3zg0bLIIcHSCp-LBwHrTyCWV7uJ0a3m-MSdvwdZ35WYE6_8LRwadKfW6hiw 1001
 1018  1025  1018  1018 ?           -1 Sl    1001   0:51      \_ /opt/conda/bin/python /opt/conda/bin/jupyter-notebook --NotebookApp.token= --port=8888 --no-browser
 1025  1033  1033  1033 ?           -1 Ssl   1001   0:03          \_ python -m ipykernel_launcher -f /user-home/1001/.local/share/jupyter/runtime/kernel-a11fc3e6-cf9d-4e6f-afe2-728ba48fb0bd.json
 1033  1054  1033  1033 ?           -1 Sl    1001   0:28          |   \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g -Djavax.net.ssl.trustStore=/user-home/_global_/security/customer-truststores/cacerts org.apache.spark.deploy.SparkSubmit pyspark-shell
 1025  1167  1167  1167 ?           -1 Ssl   1001   2:07          \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g -Djavax.net.ssl.trustStore=/user-home/_global_/security/customer-truststores/cacerts org.apache.spark.deploy.SparkSubmit --class org.apache.toree.Main /opt/conda/share/jupyter/kernels/apache_toree_scala/lib/toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar --profile /user-home/1001/.local/share/jupyter/runtime/kernel-1a4d3565-a32e-4d80-875b-93da83451a3c.json
 1025  1318  1318  1318 ?           -1 Ssl   1001   0:02          \_ /opt/conda/lib/R/bin/exec/R --slave -e IRkernel::main() --args /user-home/1001/.local/share/jupyter/runtime/kernel-f037058c-013a-4eae-b554-4073585c9172.json
    1  1328  1318  1318 ?           -1 Sl    1001   0:09 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g -Djavax.net.ssl.trustStore=/user-home/_global_/security/customer-truststores/cacerts org.apache.spark.deploy.SparkSubmit sparkr-shell /tmp/RtmpyKF5Yq/backend_port5266fef1bd1
~                                                                                                                                                                    
 
 
Regards,
 
Mihai Iacob
DSX Local - Security, IBM Analytics

--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to