Karthik Palaniappan created SPARK-24668: -------------------------------------------
Summary: PySpark crashes when getting the webui url if the webui is disabled Key: SPARK-24668 URL: https://issues.apache.org/jira/browse/SPARK-24668 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.0 Environment: * Spark 2.3.0 * Spark-on-YARN * Java 8 * Python 2 * Jupyter Reporter: Karthik Palaniappan Repro: Evaluate `sc` in a Jupyter notebook: {{---------------------------------------------------------------------------}} {{Py4JJavaError Traceback (most recent call last)}} {{/opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)}} {{ 343 method = get_real_method(obj, self.print_method)}} {{ 344 if method is not None:}} {{--> 345 return method()}} {{ 346 return None}} {{ 347 else:}} {{/usr/lib/spark/python/pyspark/context.py in _repr_html_(self)}} {{ 261 </div>}} {{ 262 """.format(}} {{--> 263 sc=self}} {{ 264 )}} {{ 265 }} {{/usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)}} {{ 373 def uiWebUrl(self):}} {{ 374 """Return the URL of the SparkUI instance started by this SparkContext"""}} {{--> 375 return self._[jsc.sc|https://www.google.com/url?q=http://jsc.sc&sa=D&usg=AFQjCNHUwO0Cf3OHs1QafBFXzShZ_PU8IQ]().uiWebUrl().get()}} {{ 376 }} {{ 377 @property}} {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in __call__(self, *args)}} {{ 1158 answer = self.gateway_client.send_command(command)}} {{ 1159 return_value = get_return_value(}} {{-> 1160 answer, self.gateway_client, self.target_id, [self.name|https://www.google.com/url?q=http://self.name&sa=D&usg=AFQjCNEu_LlQOduOrIyV64UgIuRgm6Ea2w])}} {{ 1161 }} {{ 1162 for temp_arg in temp_args:}} {{/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)}} {{ 61 def deco(*a, **kw):}} {{ 62 try:}} {{---> 63 return f(*a, **kw)}} {{ 64 except py4j.protocol.Py4JJavaError as e:}} {{ 65 s = e.java_exception.toString()}} {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)}} {{ 318 raise Py4JJavaError(}} {{ 319 "An error occurred while calling \{0}{1}\{2}.\n".}} {{--> 320 format(target_id, ".", name), value)}} {{ 321 else:}} {{ 322 raise Py4JError(}} {{Py4JJavaError: An error occurred while calling o80.get.}} {{: java.util.NoSuchElementException: None.get}} {{ at scala.None$.get(Option.scala:347)}} {{ at scala.None$.get(Option.scala:345)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} {{ at java.lang.reflect.Method.invoke(Method.java:498)}} {{ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)}} {{ at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)}} {{ at py4j.Gateway.invoke(Gateway.java:282)}} {{ at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)}} {{ at py4j.commands.CallCommand.execute(CallCommand.java:79)}} {{ at py4j.GatewayConnection.run(GatewayConnection.java:214)}} {{ at java.lang.Thread.run(Thread.java:748)}} PySpark only prints out the web ui url in `_repr_html`, not `__repr__`, so this only happens in notebooks that render html, not the pyspark shell. [https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d] Disabling Spark's UI with `spark.ui.enabled` *is* valuable outside of tests. A couple reasons that come to mind: 1) If you run multiple spark applications from one machine, Spark irritatingly starts picking the same port (4040), as the first application, then increments (4041, 4042, etc) until it finds an open port. If you are running 10 spark apps, then the 11th prints out 10 warnings about ports being taken until it finally finds one. 2) You can serve the spark web ui from a dedicated spark history server instead of per-driver. This is documented here, at least for Spark-on-YARN: [https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.] PySpark should not crash if the web ui is disabled. There are a couple of options: 1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or the history server url, depending on which one is being used. 2) PySpark should call getOrElse(None) rather than get(). I strongly prefer option 1), but I can't figure out how to do it in a non-hacky way. In SparkContext.scala, uiWebUrl() comes from `_ui.map(_.webUrl)`, where `_ui` contains the actual SparkUI if spark.ui.enabled=true. 1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid calling `bind()` on the UI server. I'm not sure what the implications would be for classes outside of SparkContext that use SparkContext#ui. 2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the in-driver UI and `uiWebUrl()` returns the in-driver or history URL. I would appreciate some help figuring out how to proceed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org