Re: [PR] [ZEPPELIN-6193] Refactor SparkSubmitInterpreter to Improve Readability and Error Handling [zeppelin]

via GitHub Mon, 02 Jun 2025 05:54:47 -0700


Reamer commented on code in PR #4936:
URL: https://github.com/apache/zeppelin/pull/4936#discussion_r2121065097



##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
 
 
 /**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard 
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run 
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the 
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin 
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be 
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment 
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar 
spark-args
  */
 public class SparkSubmitInterpreter extends ShellInterpreter {
 
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
 
-  private String sparkHome;
-
-  // paragraphId --> yarnAppId
+  private final String sparkHome;
   private ConcurrentMap<String, String> yarnAppIdMap = new 
ConcurrentHashMap<>();
 
   public SparkSubmitInterpreter(Properties property) {
     super(property);
-    // Set time to be max integer so that the shell process won't timeout.
-    setProperty("shell.command.timeout.millisecs", Integer.MAX_VALUE + "");
-    this.sparkHome = properties.getProperty("SPARK_HOME");
+    setProperty("shell.command.timeout.millisecs", 
String.valueOf(Integer.MAX_VALUE));
+    this.sparkHome = property.getProperty("SPARK_HOME");
     LOGGER.info("SPARK_HOME: {}", sparkHome);
   }
 
+  /**
+   * Executes a spark-submit command based on the user's input in a Zeppelin 
notebook paragraph.
+   * <p>
+   * This method constructs the full spark-submit CLI command using the 
configured SPARK_HOME and
+   * the
+   * provided arguments. It performs validation (e.g., SPARK_HOME presence), 
logs the execution,
+   * and registers a listener to extract Spark UI information from the output 
logs.
+   * <p>
+   * If SPARK_HOME is not set, an error result is returned.
+   * After execution, any associated YARN application ID is removed from the 
internal tracking map.
+   *
+   * @param cmd     The spark-submit arguments entered by the user (e.g., 
"--class ...
+   *                /path/to/jar").
+   * @param context The interpreter context for the current paragraph 
execution.
+   * @return An {@link InterpreterResult} representing the outcome of the 
spark-submit execution.
+   */
   @Override
   public InterpreterResult internalInterpret(String cmd, InterpreterContext 
context) {
     if (StringUtils.isBlank(cmd)) {
       return new InterpreterResult(InterpreterResult.Code.SUCCESS);
     }
+
+    if (StringUtils.isBlank(sparkHome)) {
+      String errorMsg = "SPARK_HOME is not set. Please configure SPARK_HOME in 
the interpreter " +
+          "setting or environment.";
+      LOGGER.error("Failed to run spark-submit: {}", errorMsg);
+      return new InterpreterResult(InterpreterResult.Code.ERROR, errorMsg);
+    }
+
     String sparkSubmitCommand = sparkHome + "/bin/spark-submit " + cmd.trim();

Review Comment:
   I think it is important that we also check the existence of the submit 
command.
   Something like that. This is also how we get rid of the path separators.
   ```
   File sparkSubmit = Paths.get(sparkHome, "bin", "spark-submit").toFile();
       if (sparkSubmit.exists()) {
   ```



##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
 
 
 /**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard 
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run 
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the 
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin 
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be 
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment 
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar 
spark-args
  */
 public class SparkSubmitInterpreter extends ShellInterpreter {
 
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
 
-  private String sparkHome;
-
-  // paragraphId --> yarnAppId

Review Comment:
   Please leave the comment in the code.



##########
spark-submit/src/main/java/org/apache/zeppelin/spark/submit/SparkSubmitInterpreter.java:
##########
@@ -37,32 +36,72 @@
 
 
 /**
- * Support %spark-submit which run spark-submit command. Internally,
- * it would run shell command via ShellInterpreter.
- *
+ * Interpreter that supports the `%spark-submit` command in Apache Zeppelin.
+ * <p>
+ * This interpreter allows users to submit Spark jobs using the standard 
`spark-submit` CLI
+ * interface.
+ * Internally, it delegates execution to the ShellInterpreter to run 
`spark-submit` as a shell
+ * command.
+ * <p>
+ * Key features:
+ * - Automatically builds and executes the `spark-submit` command using the 
configured SPARK_HOME
+ * path.
+ * - Extracts the Spark UI URL from logs and publishes it to the Zeppelin 
frontend.
+ * - Tracks the YARN Application ID from the logs, allowing the job to be 
cancelled via `yarn
+ * application -kill`.
+ * - Handles both YARN and local Spark modes.
+ * <p>
+ * Required configuration:
+ * - SPARK_HOME must be set in the interpreter setting or environment 
variables. It should point
+ * to the root
+ * directory of a valid Spark installation.
+ * <p>
+ * Example usage in a Zeppelin notebook:
+ * %spark-submit --class org.apache.spark.examples.SparkPi /path/to/jar 
spark-args
  */
 public class SparkSubmitInterpreter extends ShellInterpreter {
 
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SparkSubmitInterpreter.class);
 
-  private String sparkHome;
-
-  // paragraphId --> yarnAppId
+  private final String sparkHome;
   private ConcurrentMap<String, String> yarnAppIdMap = new 
ConcurrentHashMap<>();
 
   public SparkSubmitInterpreter(Properties property) {
     super(property);
-    // Set time to be max integer so that the shell process won't timeout.
-    setProperty("shell.command.timeout.millisecs", Integer.MAX_VALUE + "");
-    this.sparkHome = properties.getProperty("SPARK_HOME");
+    setProperty("shell.command.timeout.millisecs", 
String.valueOf(Integer.MAX_VALUE));
+    this.sparkHome = property.getProperty("SPARK_HOME");

Review Comment:
   The properties are passed all the way through to the `Interpreter` class.
   
https://github.com/apache/zeppelin/blob/1468714b5b187bafe83ff0bf13cac3be50642c4e/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/Interpreter.java#L156-L158
   
   We should adopt the variable name `properties` from the interpreter class 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [ZEPPELIN-6193] Refactor SparkSubmitInterpreter to Improve Readability and Error Handling [zeppelin]

Reply via email to