[ https://issues.apache.org/jira/browse/SPARK-24668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795907#comment-16795907 ]
Nikolay Kashtanov commented on SPARK-24668: ------------------------------------------- [~hyukjin.kwon] could you please assign it to me? > PySpark crashes when getting the webui url if the webui is disabled > ------------------------------------------------------------------- > > Key: SPARK-24668 > URL: https://issues.apache.org/jira/browse/SPARK-24668 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0, 2.4.0 > Environment: * Spark 2.3.0 > * Spark-on-YARN > * Java 8 > * Python 3.6.5 > * Jupyter 4.4.0 > Reporter: Karthik Palaniappan > Priority: Minor > > Repro: > > Evaluate `sc` in a Jupyter notebook: > > > {{---------------------------------------------------------------------------}} > {{Py4JJavaError Traceback (most recent call > last)}} > {{/opt/conda/lib/python3.6/site-packages/IPython/core/formatters.py in > __call__(self, obj)}} > {{ 343 method = get_real_method(obj, self.print_method)}} > {{ 344 if method is not None:}} > {{--> 345 return method()}} > {{ 346 return None}} > {{ 347 else:}} > {{/usr/lib/spark/python/pyspark/context.py in _repr_html_(self)}} > {{ 261 </div>}} > {{ 262 """.format(}} > {{--> 263 sc=self}} > {{ 264 )}} > {{ 265 }} > {{/usr/lib/spark/python/pyspark/context.py in uiWebUrl(self)}} > {{ 373 def uiWebUrl(self):}} > {{ 374 """Return the URL of the SparkUI instance started by this > SparkContext"""}} > {{--> 375 return > self._[jsc.sc|https://www.google.com/url?q=http://jsc.sc&sa=D&usg=AFQjCNHUwO0Cf3OHs1QafBFXzShZ_PU8IQ]().uiWebUrl().get()}} > {{ 376 }} > {{ 377 @property}} > {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in > __call__(self, *args)}} > {{ 1158 answer = self.gateway_client.send_command(command)}} > {{ 1159 return_value = get_return_value(}} > {{-> 1160 answer, self.gateway_client, self.target_id, > [self.name|https://www.google.com/url?q=http://self.name&sa=D&usg=AFQjCNEu_LlQOduOrIyV64UgIuRgm6Ea2w])}} > {{ 1161 }} > {{ 1162 for temp_arg in temp_args:}} > {{/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw)}} > {{ 61 def deco(*a, **kw):}} > {{ 62 try:}} > {{---> 63 return f(*a, **kw)}} > {{ 64 except py4j.protocol.Py4JJavaError as e:}} > {{ 65 s = e.java_exception.toString()}} > {{/usr/lib/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name)}} > {{ 318 raise Py4JJavaError(}} > {{ 319 "An error occurred while calling > \{0}{1}\{2}.\n".}} > {{--> 320 format(target_id, ".", name), value)}} > {{ 321 else:}} > {{ 322 raise Py4JError(}} > {{Py4JJavaError: An error occurred while calling o80.get.}} > {{: java.util.NoSuchElementException: None.get}} > {{ at scala.None$.get(Option.scala:347)}} > {{ at scala.None$.get(Option.scala:345)}} > {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}} > {{ at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}} > {{ at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} > {{ at java.lang.reflect.Method.invoke(Method.java:498)}} > {{ at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)}} > {{ at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)}} > {{ at py4j.Gateway.invoke(Gateway.java:282)}} > {{ at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)}} > {{ at py4j.commands.CallCommand.execute(CallCommand.java:79)}} > {{ at py4j.GatewayConnection.run(GatewayConnection.java:214)}} > {{ at java.lang.Thread.run(Thread.java:748)}} > > PySpark only prints out the web ui url in `_repr_html`, not `__repr__`, so > this only happens in notebooks that render html, not the pyspark shell. > [https://github.com/apache/spark/commit/f654b39a63d4f9b118733733c7ed2a1b58649e3d] > > Disabling Spark's UI with `spark.ui.enabled` *is* valuable outside of tests. > A couple reasons that come to mind: > 1) If you run multiple spark applications from one machine, Spark > irritatingly starts picking the same port (4040), as the first application, > then increments (4041, 4042, etc) until it finds an open port. If you are > running 10 spark apps, then the 11th prints out 10 warnings about ports being > taken until it finally finds one. > 2) You can serve the spark web ui from a dedicated spark history server > instead of per-driver. This is documented here, at least for Spark-on-YARN: > [https://spark.apache.org/docs/latest/running-on-yarn.html#using-the-spark-history-server-to-replace-the-spark-web-ui.] > > PySpark should not crash if the web ui is disabled. There are a couple of > options: > 1) SparkContext#uiWebUrl() in Scala should return the driver web ui url or > the history server url, depending on which one is being used. > 2) PySpark should call getOrElse(None) rather than get(). > > I strongly prefer option 1), but I can't figure out how to do it in a > non-hacky way. In SparkContext.scala, uiWebUrl() comes from > `_ui.map(_.webUrl)`, where `_ui` contains the actual SparkUI if > spark.ui.enabled=true. > 1) I could set `_ui` to SparkUI.createHistoryUI(), and then just avoid > calling `bind()` on the UI server. I'm not sure what the implications would > be for classes outside of SparkContext that use SparkContext#ui. > 2) I could make `_ui` and `uiWebUrl()` inconsistent. `_ui` only contains the > in-driver UI and `uiWebUrl()` returns the in-driver or history URL. > > I would appreciate some help figuring out how to proceed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org